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1  Statement  of  the  problems  studied. 

In  this  research  project  we  studied  two  very  important  problems  in  probability  and 
statistics.  The  first  one  is  a  problem  in  the  theory  of  large  deviations  with  applications 
to  studying  robustness  of  statistical  procedures,  efficiencies  and  the  bootstrap  resam¬ 
pling  method.  More  specifically,  we  have  established  the  large  deviation  principle  for 
a  sequence  of  probability  measures  {/x„}  on  a  product  space  Q,i  x  fl2  when  the  corre¬ 
sponding  sequences  of  marginal  and  conditional  distributions  possess  the  large  deviation 
property.  We  have  used  this  result  to  study  the  large  deviation  behavior  of  the  boot¬ 
strap  resampling  procedure.  And  also  used  the  result  to  study  robustness  of  location 
parameter  tests  in  contaminated  normal  populations  via  Bahadur  slopes  and  efficiencies. 
The  second  problem  is  concerned  with  the  statistical  analysis  of  longitudinal  data.  In  a 
seminal  paper  Liang  &  Zeger  (1986,  Biometrika  73,  13-23)  introduced  the  generalized 
estimating  equations  (GEE)  as  a  statistical  tool  for  analyzing  longitudinal  data.  The 
GEE  method  uses  a  generalized  quasi-score  function  to  estimate  the  regression  param¬ 
eter,  and  moment  estimates  for  the  correlation  parameters.  Recently,  Crowder  (1995, 
Biometrika,  82,  407-410)  has  pointed  out  some  pitfalls  with  the  estimation  of  the  cor¬ 
relation  parameters  in  the  GEE  method.  In  this  research  we  developed  an  alternative 
estimation  procedure  which  overcomes  those  pitfalls.  This  alternative  method  is  known 
as  the  Quasi-least  squares  (QLS)  since  it  uses  a  partial  minimization,  based  on  the  prin¬ 
ciple  of  (generalized)  least  squares.  Below  we  will  give  a  brief  outline  of  the  technical 
details  of  the  work  done  under  this  contract. 

2  Summary  of  the  most  important  results. 

2.1  Large  deviations  for  joint  distributions. 

Let  f2  be  a  Polish  space,  that  is,  a  complete  separable  metric  space  and  B  be  the  Borel 
(T-field  on  fl  containing  all  the  open  and  closed  subsets  of  Cl.  A  function  I(x)  :  Cl  — >  [0,  oo] 
is  said  to  be  a  rate  function  if  it  is  lower  semi-continuous.  Let  {//n}  be  a  sequence  of 
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probability  measures  on  (12,  B).  We  say  that  {pn}  obeys  large  deviation  principle  (LDP) 
with  rate  function  I(x )  if  the  following  conditions  are  satisfied: 

(1)  limsup  -  log  pn{C)  <  —1(C) 

n  n 

(2)  liininf  ~l°g  Vhi(G)  >  —/((?) 

for  all  closed  sets  C  and  for  all  open  sets  G  of  12.  The  rate  function  I(x)  is  known  as  a 
proper  rate  function  if  for  each  L  >  0,  the  level  set  {x  :  I(x)  <  L)  is  a  compact  subset  of 
Cl.  Let  (Cli,  B\),  (122,  B2)  be  two  Polish  spaces  with  their  associated  Borel  a-fields.  Let 
{//ln}  be  a  sequence  of  probability  measures  on  (12 1,  B\)  and  {un(xi,  B2)}  be  a  sequence 
of  transition  functions  on  12i  x  B2.  Consider  a  sequence  of  probability  measures  {pn}  on 
the  product  space  (Cl,  B)  =  (12i  x  122,  B\  ®  B2)  given  by 

pn(BixB2)-f  un(xi,  B2)  dpmixi) 

J  Bi 

for  Bi  £  Bi,  i  =  1,2.  We  say  that  the  sequence  of  probability  transition  functions 
{vn(x  1,  •),  xi  e  Qi}  satisfies  the  LDP  continuously  in  Xi  with  rate  function  J(x  1,  x2), 
or  simply  LDP  continuity  condition  holds,  if 

(i)  For  each  X\  €  12i,  J(x  1,  •)  is  a  proper  rate  function  on  Cl2. 

(ii)  For  any  sequence  {xi„}  in  Cli  such  that  xin  xi,  the  sequence  of  measures 
\yn(x\n,  •)}  on  Cl2  obeys  the  LDP  with  rate  function  J(x  1,  •),  and 

(iii)  J(x\,  x2)  is  jointly  lower  semi-continuous  in  (xi,x2). 

Our  main  result  can  be  stated  as  follows.  Suppose  that  the  sequence  {nin}  obeys 
the  LDP  with  proper  rate  function  Ji(xi)  and  the  sequence  of  probability  transition 
functions  {vn(x\,  •),  x\  £  12i}  satisfies  the  LDP  continuously  in  X\  with  rate  function 
J(x  1,  x2).  Then  the  sequence  of  joint  distributions  {//„}  obeys  the  LDP  with  rate  func¬ 
tion  I(x  1,  x2)  =  I\(xi)  +  J(xi,  x2).  There  are  several  interesting  applications  of  this 
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theorem  in  statistics.  In  particular  the  theorem  shows  that  the  joint  distribution  of  the 
ordinary  empirical  measure  of  a  sample  and  the  corresponding  bootstrap  empirical  mea¬ 
sure  obeys  the  LDP  in  the  weak  topology.  Other  applications  include  establishing  the 
LDP  property  for  several  sampling  distributions  that  arise  naturally  in  statistics. 

2.2  Bahadur  slopes  of  tests  in  contaminated  models. 

One  of  the  most  basic  problems  in  statistics  is  to  test  a  null  hypothesis  concerning 
the  location  parameter  assuming  that  we  have  a  random  sample  of  n  observations  from 
a  normal  population.  Several  test  statistics  are  candidates  for  this  testing  problem:  the 
mean  test,  the  t  test,  the  sign  test  and  the  Wilcoxon  test.  Among  these  test  statistics, 
it  is  well  known  that  the  i-test  is  uniformly  most  powerful  unbiased  test  if  the  normality 
assumption  holds.  But  it  is  not  clear  that  the  t  test  will  continue  to  be  most  powerful 
if  there  is  a  departure  from  normality.  To  study  the  robustness  of  these  tests,  it  has 
been  a  standard  practice  to  examine  the  performance  of  these  tests  under  the  Tukey 
model  of  contaminated  alternatives.  Under  the  Tukey  model  the  sample  consists  of  i.i.d. 
observations  from  the  density 

f(x)  =  (1  -  e)  (f>(x;  9,  1)  +  e<j>{x\  9 ,  a). 

Here  9,  a)  denotes  the  probability  density  function  of  a  normal  random  variable  with 
mean  9  and  standard  deviation  a.  And  e  is  a  number  between  0  and  1  representing  the 
proportion  of  contamination.  Two  measures  which  are  commonly  used  to  compare  the 
large  sample  properties  of  these  tests  are  the  Pitman  efficiencies  and  the  Bahadur  slopes. 
Several  authors  have  examined  the  robustness  of  the  aforementioned  test  statistics,  by 
computing  the  Pitman  efficiencies.  But  not  much  work  was  done  as  regards  to  the 
computation  of  Bahadur  slopes  and  efficiencies,  since  it  is  much  harder  problem. 

The  problem  of  deriving  the  Bahadur  slopes  is  not  an  easy  task  and  depends  heavily 
on  the  theory  of  large  deviations.  In  fact  the  problem  of  calculating  the  Bahadur  slopes  of 
test  statistics  provided  the  impetus  for  the  development  of  large  deviation  theory.  Both 
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the  establishment  of  the  LDP  for  a  sequence  of  distributions  and  the  identification  of  the 
rate  function  are  essential  for  explicit  calculations  of  Bahadur  slopes.  As  an  important 
application  of  the  large  deviation  result  described  in  Section  2.1,  we  have  obtained  the 
Bahadur  slopes  of  the  four  test  statistics  in  the  Tukey  model.  From  an  examination  of 
these  slopes,  it  appears  that  the  Wilcoxon  test  is  the  best  performer  in  a  neighborhood 
of  the  null  hypothesis,  even  under  the  presence  of  moderate  contamination,  but  is  not 
the  best  performer  uniformly  over  the  whole  region  of  the  alternative  hypothesis. 

2.3  Quasi-least  squares. 

The  statistical  analysis  of  longitudinal  discrete  and  continuous  data  has  become  an  ac¬ 
tive  research  topic  in  recent  years.  Several  books  on  the  topic  have  also  been  published. 
Such  data  naturally  occur  when  repeated  observations  are  taken  on  individuals,  or  the 
data  is  taken  on  clusters  or  groups  of  subjects  sharing  similar  characteristics.  In  a  sem¬ 
inal  paper,  Liang  and  Zeger  (1986,  Biometrika,  73,  13-22)  introduced  the  generalized 
estimating  equations  (GEE)  for  analyzing  longitudinal  data.  The  main  idea  of  Liang 
and  Zeger  (1986)  is  to  model  the  dependence  among  the  repeated  measurements  on  each 
subject  in  the  form  of  a  “working  correlation  matrix”  which  is  assumed  to  be  a  function 
of  a  vector  a  of  parameters.  An  estimate  of  a  is  obtained  using  the  Pearsonian  residuals. 
The  GEE  method  has  become  so  popular  that  the  1986  article  of  Liang  and  Zeger  has 
been  included  in  Volume  3  of  “Breakthroughs  in  Statistics.”  But  recently  Crowder  (1995, 
Biometrika,  82,  407-410)  has  pointed  out  some  pitfalls  with  the  estimation  of  the  corre¬ 
lation  parameters  in  the  GEE  method.  First  the  estimate  of  a  based  on  the  Pearsonian 
residuals  may  not  fall  within  the  set  of  feasible  values,  leading  to  a  complete  breakdown 
of  the  estimation  procedure.  Second,  even  if  it  is  feasible,  it  may  not  be  consistent  and 
it  is  subject  to  an  uncertainty  of  definition  which  can  lead  to  loss  of  efficiency  of  the 
regression  parameter  estimate.  Furthermore,  there  can  be  no  general  asymptotic  theory 
supporting  existence  or  consistency  of  the  joint  distribution  of  the  regression  and  the 
correlation  parameter  estimates. 
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In  this  research  project  we  discovered  a  new  approach  for  estimating  the  correlation 
parameter  which  overcomes  all  of  the  above  pitfalls.  We  call  this  new  approach  as 
the  Quasi-least  squares  (QLS)  method.  Not  only  does  the  QLS  method  yields  feasible 
estimate  for  the  correlation  parameter  a,  it  has  several  other  advantages.  For  some 
commonly  employed  working  correlation  structures  we  have  closed  form  solutions  for  the 
estimate  of  the  correlation  parameters.  When  the  correlation  matrix  is  unstructured,  the 
QLS  estimate  of  the  correlation  matrix  involves  a  new  factorization  of  a  positive  definite 
matrix.  This  factorization  does  not  have  a  closed  form  solution,  and  in  this  research 
we  have  developed  a  recursive  algorithm  to  obtain  the  factorization.  Unlike  the  GEE 
method  the  QLS  method  can  accommodate  a  wide  range  of  correlation  structures  that 
are  useful  to  analyze  unbalanced  and  unequally  spaced  longitudinal  data.  While  the 
QLS  estimate  of  the  regression  parameter  is  consistent  and  asymptotically  normal,  the 
estimate  of  the  correlation  parameter  is  asymptotically  biased.  In  this  research  we  also 
obtained  a  modified  QLS  estimate  of  the  correlation  parameter  which  is  consistent  and 
asymptotically  normal.  For  the  structured  correlation  matrices  the  modified  estimate  is 
not  only  consistent  but  also  robust  among  the  popular  working  correlation  structures. 
We  have  also  developed  some  extensions  of  our  results  to  analyzing  multivariate  repeated 
measurements. 
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Sethuraman  [Sankhya  A  23  196  ,  J  .  ,  h  theorem  of  Dmwoodie  and 

for  probability  measure,  on  product  spaces  re-establ»h« the  mam  ^  ^  gener#lize9  the 

Zabell  [Ann.  Probah.  20 1992,  1147‘1166^“ 15 1987,  610-627].  Our 

LDP  for  product  f  vuT^DP I JL- J  s.utbdcul  dislribudous-  F" 

methods. 

1.  Introduction 

Let  G  be  a  Polish  space,  that  is,  a  ^“of ^  A 

be  the  Borel  d-field  on lO  containing  P  faction  if  it  i»  lower  seim- 

function  /(a)  :  «  -  l0'  °°)  “  'dJf^'baV.Uty  measures  on  (0,  B).  We  say 
continuous.  Let  {fin}  be  a  sequen  -  (WLDP)  with  rate  function 

“  ssfti- - — <— 

if  the  following  conditions  are  satisfied. 
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(1)  limsup  -  log /in(A')  < -Z(A')  ...(1.1) 

n  n 

(2)  liminf -log  nn(G)  >  -I{G)  ...(1.2) 

for  all  compact  sets  I<  and  for  all  open  sets  G  of  ft.  When  {/*„}  satisfies  (1.2) 
and  also  satisfies  condition  (1.1)  for  all  closed  sets  C,  we  say  that  it  obeys  the 
large  deviation  principle  (LDP).  It  is  clear  that  if  {fxn}  satisfies  the  LDP  then 
it  also  satisfies  the  WLDP.  The  rate  function  7(x)  is  known  as  a  proper  rate 
function  if  for  each  L  >  0,  the  level  set  {x  :  /(x)  <  L)  is  a  compact  subset  of 
ft.  Note  that  proper  rate  functions  are  also  rate  functions,  since  a  nonnegative 
function  is  lower  semi-continuous  if  and  only  if  the  level  sets  are  closed. 

The  following  definition  of  large  deviation  tightness,  extensively  used  m  large 
deviation  theory,  is  useful  when  describing  the  parallels  between  weak  conver¬ 
gence  and  the  LDP,  and  also  in  simplifying  several  proofs.  Our  definition  ol 
large  deviation  tightness  is  same  as  the  definition  of  exponential  tightness  in 
Dembo  and  Zeitouni  (1993). 

Definition.  A  sequence  of  measures  {/i„}  is  large  deviation  tight  (LD 
tight)  if  for  each  N  <  oo,  there  exists  a  compact  set  Ks  such  that 


limsup  -  log  <  —N- 


..(1.3) 


Let  (ftx,  Si),  (ft2,  S2)  be  two  Polish  spaces  with  their  associated  Borel 
<7-fields.  Let  {^i„}  be  a  sequence  of  probability  measures  on  (fti,  B\) 
luJxu  B2))  be  a  sequence  of  transition  functions  on  fti  x  S2.  Consider  a 
sequence  of  probability  measures  {/i„}  on  the  product  space  (ft,  S)  (  i  x 
ft2,  Si  <8>  S2)  given  by 


x  S2)  =  J  vn{xu  B2)dfiin(x\) 
J  B  i 


...(1.4) 


for  B%  £  t  —  1)  2.  t  c 

We  say  that  the  sequence  of  probability  transition  functions  { zi 

fti}  satisfies  the  LDP  continuously  in  xi  with  rate  function  J(x i,  x2),  or  simp  y 

the  LDP  continuity  condition  holds,  if 


(i)  For  each  *i  €  ftr,  J(*i.O  »s  a  proper  rate  function  on  ft2. 

(ii)  For  any  sequence  {xu}  in.  fti  such  that  xi„  — »  *i,  the  sequence  of  mea 

{i/n(xm,  •)}  on  ft2  obeys  the  LDP  with  rate  function  J(xi,  ■). 

(iii)  J(n,x2)  is  lower  semi-continuous  as  a  function  of  (xi,x2). 
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When  (0  and  (ii)  alone  hold,  we  say  that  the  sequence  of  transition  functions 
(*,)*.“  0,  satisfies  the  expnneatiaJ  continuity  coni, t, on  with  proper 
rate  function  ),  following  the  definition  given  in  (1.7)  of  D.nwood,,  and 

ZlbSe!pp^2that  the  sequence  {p,„}  obeys  the  LDP  with  proper  rate  function 
/,(!,)  and  the  sequence  of  probability  transition  functions  W>i, 
satisfies  the  LDP  continuously  in  xi  with  rate  function  J(*i,  2)- 
conditions,  the  m.n  Theorem  2.3  of  this  paper  shows  that  the  sequence  of  J  P  mt 
distributions  M  obeys  the  WLDP  with  rate  function  I  xlt  xj) I  - I (*0  + 

J(x  1  X2).  And  the  sequence  of  marginal  distributions  {pmW  M  1 

on  fi obevs  the  LDP  with  rate  function  /2(x2)  -  mfrien,  U(xi-  *2)]- 

Ire  the  sequence  {u„}  obeys  the  LDP  if  /(*  1,  x2)  is  a  proper  rate  function. 

Th  proof  uses  Varadhan’s  theorem  on  asymptotic  behavior  of  certain  integrals. 
Theorem  2^  generalizes  Corollary  2.9  of  Lynch  and 

uct  measures.  The  main  theorem  of  Dmwoodie  and_Zabel  (1  92)^30  tot  *  ^ 
from  our  Theorem  2.3  as  a  special  case  where  pm  -  P,  for 

m' Theorem”'  3  U^refuTtoMtSh'the  LDP  for  commonly  occurring  statistical 

Sb^ 

wmmmi 

Zabell  (1992).  Other  important  applications  ofo“r™a'"  “  f  Zaj,c  (,993). 

the  LDP  for  sample  path  processes  can  I b' ' 0U"£  ”  T.  ,  I, . I,  based  on 

Our  theorem  is  also  useful  to  establish  the  LDPforthet 
a  random  sample  from  a  contaminated  normal  distribution,  see  on  g 

HSSfiSr-rSS 

between  several  results  in  wea^  c°^r8e  Iisung  further  parallels.  Recently, 
followed  by  summary  table  in  Vervaat  (1988)  li  8  criterion  by  showing 

Puhalskii  (1991)  has  carried  the  paralle  ,sm  ^J^nnnJ 

the  equivalence  of  large  deviation  tightness  and  ,wism  extends  to 

possessing  the  LDP.  Our  distributions 

the  theorems  in  Sethuraman  (196  )  national  distributions.  Some  re- 

in  terms  of  the  convergence  of™-g.  nalsandc  d  Getnlse  (1995). 

suits  similar  to  ours  in  the  context  of  capacities  .  the  main 

The  organization  of  this  paper  is  as  follows:  In  Section  l  we  pr 
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theotem  of  this  paper  after  stating  some  known  theorems  in  large  deviation 
theory,  in  Section  3,  we  show  that  a  new  interpretation  of  a  theorem  of  El¬ 
lis  (1984),  provides  sufficient  conditions  for  the  LDP  continuity  cond.  ion  to 
hold  for  a  sequence  of  probability  transition  functions  defined  on  an  End, dean 
space.  We  also  present  examples  of  statistical  distributions  where  the  LDP 
continuity  condition  is  satisfied.  In  Section  4.  we  establish  the  LDP  for  some 
sampling  distributions  that  arise  in  statistical  theory. 

2.  Preliminaries  and  Main  Result 

In  this  section  we  state  and  prove  the  main  theorem  of  this  paper  We  will 
first  present  some  known  results  in  large  deviations  »h>ch  "e  "eeded  in  the 
proof  of  our  main  theorem.  The  interrelationship  between  the  LDP  WLDP  and 
LD  tightness  is  given  in  the  following  lemma,  a  proof  of  which  can  be  foun 
Lynch  and  Sethuraman  (1987),  Dembo  and  Zeitounl  (1993). 

Lemma  2.1.  Let  {/r„}  be  e  science  of  probabilit,  measures  defined  on  a 

Polish  space  Q.  Then  the  following  hold: 
m  n  ln  \  is  LD  tight  and  obeys  the  WLDP  with  rate  function  I{  ) 

0)  is  a  proper  rate  function  and  W  ohe*  **  LDP  with  prayer  rate 

(2)  //  {pn}  obeys  the  LDP  with  proper  rate  function  I{x),  then  {pn}  is  LD 
tight. 

The  following  theorem  due  to  Varadhan  (1966),  plays  an  important  role  in 
the  proof  of  our  main  theorem.  The  special  case  of  Theorem  2.2,  where  F. -  f 
■'in  and  F  is  a  bounded  continuous  function,  is  widely  quoted  in  large 
Iheorv  and  it  known  as  Varadhan’s  theorem  on  the  asymptotics  of  integrals. 
See  EUis  f  1985)  Lynch  and  Sethuraman  (1987).  Theorem  2.2  is  simply  a  com¬ 
bination  3.2,  3.3  and  3.5  in  Varadhan  (1966),  and  is  useful  in  the 

proof  of  our  main  theorem. 

THEOREM  2.2  (Varadhan).  let  SI,  be  a  Polish  space.  Let  {pinjk  a^sejue^ice 

of  probability  measures  on  (fli,  Bi).  Assume  the  {pm}  V  (unctions 

proper  rate  /unction  f,(r,).  Let  (F„(x,)}  be  «  seguenee  of  real  valued  funct.o 

and  F(xi)  be  another  real  valued  function.  Let 


Hn(Bi)  =  f  exp(nF„(xi))  dp m(*i) 
J  B\ 


for  Bi  6  B\.  Then  the  following  hold: 


...(2.1) 


1  1 

LARGE  DEVIATIONS  EOR  «.NT  D,STR,BUT,ONS 

•  t  n  mnsiani  £,  <  OC  such  that  Fni^l)  —  ^  f°r 
(1)  Assume  that  there  exists  .  \  jor  any  sequence 

[  alln,Xl€Qi.  Suppose  that  limsupn  F„(n„)  <  M  D  J 

X\n  *1-  ^en 


limsupilogSnfCO  <  ^up  lFfx.) 


..(2  2) 


/or  any  closed  subset  C i  ofCli- 

m  SunOS'  that  liminfn  «.(*.)  >  ««■>  '»r  “»  Xta  “  ” 

linmf  i  log  H »(G.)  >  “g  M1**  -  (2  3> 

for  any  open  set  G\  ofUi- 
We  now  state  the  main  theorem  of  this  paper. 

THEorem  2.3.  w  («,  w.  txz 

n‘  ral'  /“nc"‘n 
Then  the  sconce  ofjoint  distrib, J’/Jifi  °(x‘' *»)"  = 

nc.  apaoa  0  =  0,  x  Q,.  obeys  the  WLDT* *  ^ .ieJS  0< 

-  *» 

is  a  proper  rate  function. 

We  will  first  prove  a  simple  lemma. 

LEMMA  2.4.  Let  Mx,)  »«  •  P">P"  «<«  “*  A*1’  ^  “  ‘ 

rate  funttion  on  Q 1  x  SV  ^*en 


/2(X2)  =  tinfi[/i(xx)  +  J(*i.I2)] 


is  a  rate  function  on  fi2- 

p  oF  Let  L  >  0  be  fixed.  It  suffices  to  show  that  the^set  M  _  {12  ^ 

i  “  >-*  l*-1  6  M  bs  !UCh  th 

sequence  {xin}  suc^  that 

ii(*in)  +  X2n^  “  /2^l2n^  +  n  “  L  +  « 


...(2.5) 
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Since  J{xit  x2)  is  a  nonnegative  function,  (2.5)  implies  that  h{x m)  <  L  +  \ 
for  all  n  >  1.  But  the  set  {x,  :  h{xx)  <  L  +  1}  is  compact,  and  therefore 
there  exists  a  subsequence  {xjn}  of  {xi„}  such  that  xjn  — ♦  x\  as  n  — ►  oo.  ince 
J(xj,  x2)  is  lower  semi-continuous  in  (ii,  x2),  from  (2.5)  it  follows  that 

/2(x*2)  <  h(xl)  +  J{x'ltxl) 

<  lim inf /i (ii„)  +  lim i°f  I2n)  ...(2.6) 

n  n 


<  L. 

Thus  x\  €  M.  This  completes  the  proof  of  the  lemma.  □ 

Proof  of  Theorem  2.3.  We  first  note  that  l{x j,  x2)  is  a  rate  function  on 
Cl  We  will  need  this  fact  below  in  (2.10).  Using  Varadhan’s  theorem  we  will 
establish  the  upper  bound  (1.1)  for  closed  rectangular  sets  Let  C,andC2  be 
closed  subsets  of  fii  and  fi2  respectively.  If  F„(xO  _  -  log Mxi,  C2)\  then 


Mn(CixC2)  =  /  vn(xx,  C2)dnm(x\) 

JCi 

=  f  exv(n  Fn{x\))  dii\n{x\). 

Jct 


...(2.7) 


Note  that  Fn(xi)  <  0  and  limsupn  Fn(*i„)  < -J(* i,  C2)  whenever  *ln  - 
Thus  by  Theorem  2.2  (1),  we  get 

limsupilogMn(Ci  xC2)  =  limsup^log  /  exp(nFn(x,))^in(xi) 

n  n  n 

<  —  inf  [/i(xi)  +  7(xi,  C2)] 

=  -,(C'XCj)  ...(2.8) 
and  therefore  the  upper  bound  (1.1)  for  closed  rectangular  sets.  In  particular 
choosing  C\  =  fli  in  (2.8)  we  get 

limsup  —  log  fi2n(C2)  ^  —  x  ^2)  =  ” ^(U2)-  ...(2  ) 

Let  A'  C  ft  =  x  ^2  be  compact  and  /  <  I(K).  For  each  (xi,  x2)  €  A', 
since  /(  )  is  lower  semi-continuous,  there  are  open  sets  0'St  in  containing  ., 
£=1,2  such  that 


/(Oi,  X  Ol7)  =  inf{/(yi,  y2)  ■  (yi.  Jfe)  £  °lx  x  >  1 


...(2.10) 
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Furthermore,  since  Q,  is  Polish,  we  can  find  open  subsets  Nr.  ,, 

r,  6  N'x  and  K  C  0< ..  Consider  the  open  covering  U ^  '  -2 

A  Because  K  is  compact  we  can  extract  a  finite  subcovering  Uj.,  N*u 

for  K.  Since  7TItj  is  closed  and  K  is  a  subset  of  UJ=,  NXlj  x  ^  we  get 
limsup  —  log  Pn[K)  < 

n  ^  “ 


<  -  min  {/(<,  x  <.)} 

—  l  <j<m 

<  -  min  x  0JM)} 

—  \<j<m 


..(2.11) 


< 

to,  each  I  <  UK),  and  hence  limsnp 

sec  and  Chen  deduce  (1.2,  foe  any  open  sec. 
Now  lee  G.  be  open  in  0,  tot  i  =  1.2.  We  can  wr.ee 


pn(G\  X  G2)  =  I  G 2) 

JG  i 


...(2.12) 


=  JCj  exp(n  Fn(xi))  d^in(ii) 


.  r/.i.  I  Wfi/„(x,.  G2)l  Since  liminf„  F„(xm)  >  F(*i)  "  J^Xu  G^ 

^Theorem  2.2  ffl.  we  gee 

nminfilogPn(G.xGc)  =  liminf  1  log  /  exp(nf„(r.)) dn.„(a.) 

n  n 


>  —  inf  [Ii(xi)  +  J(xi,  Ga)] 

Xj€^l 

=  -/(Gi  x  G2). 


...(2.13) 


Choosing  Gi  =  «i  in  (2-13)  we  get 

liminf- log  M2n(G2)  x  Ca)  = -ft(Ga). 

n  n 

.  •  Q  x  o,  Fix  e  >  0  and  choose  (xi,  X2)  satisfying 

Now  let  G  be  an  open  set  m  1  2-  :n  o.  containing  x,-,  i  —  1.2 

J(lli  X2)  <  /(G)  +  c.  There  exist  open  sets  0X,  in  fi,  containing 

such  that  0„  x  0„  C  G.  Thus 
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liminf  —  log  Pn{0)  >  Hmmfilogf.nlO,,  x0„) 

n  n 

>  -/(rltx2) 

>  -1(G)  -e. 


...(2.15) 


Since  c  >  0  is  arbitrary,  this  establishes  (1.2)  for  any  open  set  G.  Thus  W 
ob"ys  the  WLDP-  From  (2.9)  and  (2.14)  and  Lemma  2.4  we  can  see  that  {«„} 

0bTetthu's  rrmrL“("a  proper  rate  function.  By  the  Contra. 

tion  principle  (see  Appendix),  !,(*»)  <• ■  ■>»  » ‘“"“"'isle  tight. 

iW”tlmma  2  WlMhe  ^t  aiertion  of  Theorem  2.3  will  be  estabhshed  ,f  we 
By  Lemma  2.1  (i)y  tne  \  ;e  t  n  ticrhfc  riven  N  <  oo.  we  can  find 

show  that  {/i„}  is  LD  tight.  Since  {*«)  is  LD  tight,  given  iv  <  , 

a  compact  subset  K{  of  fii  such  that 

limsup  ■“ log  l*in(Ki)  <  "2JV,  •  ■ 

«  n 


for  i  =  1,2.  Hence  there  exists  n0  such  that 

ItiniKf)  <  exp(-nN) 


,..(2.17) 


fo,  1=1,2  and  n  >  n„.  Let  K  =  K,  x  X,.  X  ^  h"”6 

the  product  of  two  compact  sets.  For  n  >  n0  from  (2.17)  we  g 

M„(Ae)  <  /*u(*i)  +  M2«(*S)  (218) 


vhich  implies  that 


fX„(Ke)  <  Mln(#l)  +  M2n(A'|) 

<  2  exp(-niV) 

limsup  —  log  /in ( A c)  ^ 

_  71 


...(2.19) 


This  completes  the  proof  of  Theorem  2.3.  □ 


n  .K  9  5  It  is  interesting  to  note  that  {pm}  satisfies  the  upper 
Remark  2.5.  it  is  interest,.  6  n  setS)  even  if  J(n,  *2) 

(2.9)  for  closed  sets  and  the  lower  bound^M) J  of  J(*i .  *2) 

is  not  lower  semi-continuous  in  (n,  **)•  Ahe  low"  *  .  .  te  function. 

as  a  function  of  (ii,  12)  is  needed  to  s  ow  a  2  2  functjons  /i(xi)  and 

We  shall  now  give  some  sufficient  conditions  on  the  rate  functions  H  U 

which  guarantee  that  J(*lt  *2)  is  a  proper  rate  function. 

LEMMA  2.6.  Let  0,  and  Q,  6e  two  Polish  spaces.  Let  J  :  V  =  ^  1  * 

[0,  00]  be  a  function  satisfying  the  following  conditions. 
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i  7(t  <  L)  is  a  corn-pact  subset  of  for  any 

(b)  The  set  Ur,gKi  {*2  '•  X2/.r  L /  “ 

L  >  0  and  for  any  compact  set  K 1  0/  Hi- 

ie,  Il(l")  Se  .  proper  rau  function  defined  on  Q,  anj  Iri  «...  *.)  =  + 

j  Vit  •  — r  /"nc,io”  “ fi!- 

(/;  J(xlt  X2)  w  a  proper  rale  /unction  on  0. 

ChowthatTii,  *a)  has  compact  level  sets.  Fix  L  >  0  and  let 

M  =  {(*i,  *2)  ■  /(xi,X2)<^}-  ...(2.20) 

,  j  l  t  ftf  O  since  I(x,  x2)  is  lower  semi-continuous. 
Note  that  M  is  a  closed  subset  of  ft  since  1[  1,  2j 

Let  K,  =  {n  :  M*l)  <  «•  11  is  e“y  “  ,e"ty 

M  C  Ki  *  U  1*2  '•  J(*>.  -  L''  •  (2-21) 

sTz  «■ 

prC«:  a 

f 1  ■  ^IrZTf  Zd'ition  (b)  of  U-  2-6.  then  ,t  Mower 

S'LSwl- -  -)  «  a  proper  rate  faction. 

Lemma  2.7.  MO,  *  ° ZZX7. 
a  Polish  space.  Let  J(xi,  *2)  e  a  n  .  t-nuous  on  /or  eacA  x2  €  f*2 

i:  y^int  TdLvtifiZ:  «-■. 

on  fl. 

Proof.  We  first  note  that  ft»  is  a  P°hsh  *^(197^;  Corollary  2.4  page 
metric  space  is  topologically  complete  <j-  Dugu ^  ,a)  <  L)  is 

294)  Let  L  >  0.  It  suffices  to  show  that  M  -  \K*u 
closed  in  Q-  Note  that  we  can  represent  the  set  M  as 


m  -  y  ({*1} x  Mx «)■ 


...(2.22) 
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where  M„  =  fx2  :  Hzu  .,)  <  L),  for ..  €  Let  (x».  «.)  * <  • 

of  points  in  M  such  that  (iin.  x"in)  — ' ’  (xi>  *2)  “  n  00 '  .  ,,  V  .  ^ase 
compact  there  exists  Kn  D  *.2  •  •  a  countable  compact  neighborhood  b«e 

of  xj.  Now  for  each  j  >  1.  we  can  find  n,-  such  that  xln  6  I<xj  for  all  »  _  ;• 

Thus  we  have  for  n  >  tij , 

Xln  €  U  Mri  ••  ■(2-23) 

*  j 

which  is  a  compact  set  by  hypothesis  (b).  Hence  xj  S  for  all  j  £  b 

Since  J{-,  *2)  is  lower  semi-continuous  in  the  first  coordinate  we  can  verify 


n  u 


...(2.24) 


which  implies  that  x5  €  M,; ,  that  is,  «,  tJEM  This  completes  the  proof 
of  the  lemma.  □ 

remark  2.8.  Assume  that  obeys  the  LDP  with  proper  rate  function 

A(*.)-  Suppose  that  the  given 

by«x  ,)nTi«“it  trivially2  follows  from  Theorem  2.3  that  the  sequence  of  joint 
distributions  W  obeys  the  LDP  with  proper  rate  function 


i(x  i,  12)  =  +  h(x2), 


...(2.25) 


a  result  originally  obtained  by  Lynch  and  Sethuraman  (1987);  see  Corollary  2.9 
in  their  paper. 

Remark  2.9.  Theorem  2.3  of  Dinwoodie  and  Zabell 

from  our  Theorem  2.3  as  follows  -.  Suppose  ^  ~  ^  (Mln)  obeys 

say  S(ft)  is  compact.  Then  it  is  easy  to  see  that  the  sequen  tMmJ 

the  LDP  with  proper  rate  function 


h(x  0  =  | 


0  if  *i  €  S(/x) 

oo  otherwise. 


...(2.26) 


Simply  put,  a  single  measure  W  ^^“^2  26)  to  apr^«  »!«  function^ 
function.  Further  the  rate  funct.on  defined  m  (2^26)  m  a  proper^  ^  ^ 

goodie 

it  follows  from  our  Theorem  2.3  that  the  sequence  of  margmal  meas  {« 
obeys  the  LDP  with  rate  function  J2(*2)  -  »nfXles(#i)  J\xi>  V- 
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3.  LDP  Continuity  for  Probability  Transition  Functions  on  R“ 

In  this  section  we  will  examine  sufficient  conditions  for  the  LDP  continu- 
In  this  section  arbitrary  sequence  of  probability  transition  func- 

ity  condition  to  hold  f  (1992^  have  given 

lions  defined  on  an  Euclidean  space.  Dmwoodte  and  Zabell  ^1992)  h^ve  gt 

eve^  their  ^ suffici^t^conTitlons0!^1^^6  restrictive.  Even  m  the  simplest  case 
where  fl2  is  the  real,  line,  their  sufficient  conditions  are  not  satisfied  g 

of Yu.  random  variables  from  basic  statistical  distributions,  see  Remark  3.7 

^'we  will  first  discuss  a  method  of  verifying  the  exponential  continuity  condi- 
tion  for  an  arbitrary  sequence  of  transition  functions,  by  giving  a  new  mterpr 

tation  to  a  theorem  of  Ellis  (1984).  ,  u  b  a  sequence 

r  O,  is  a  Polish  space  and  for  each  xi  G  lh,  let  li/n^ii  )J  "  M 
of  pmlability  «rati«ioPn  function,  on  0,  =  **■  We  can  ver  fy ,  that the  exp, 
nential  continuity  condition  hold,  for  the > sequence W*.  ) ,  X,  6  t 

"  th“- 
the  distribution  of  Yn/n  is  given  by  vn{xln,  •)•  Define 

fc(x„,  0  =  jlo*  Bi“P«  »■  V"  >)!•  -(U) 

Suppose  that  lim„  c„(x,,  .)  exist,  and  equals  c(x,  <)  for  * 

allow  +oo  both  as  a  limit  value  and  as  an  element  m  the  sequence  U  " 

7  <c\  -He  n**  :  c(*i,  t)  <  oo}.  The  function  c(n,  i)  : K  s  said 

u  *‘i(  A  -}  tl  c(x  t  <  o)  is  closed  for  each  real  a.  This  is  equivalent  to 
to  be  closed  if  {t  •  c(ri,  t)  )  ,  ,  .  differentiable  Qn  the  interior 

c(x,,  0  being  lower  serm-con  mucus If  «(*.. «  «  for  any  Kquencc 

of  Vt ,  (c),  then  we  call  c(*i ,  t)  steep  if  ||  grad(c(  1 ,  »))  II  ’  ^d, 

{tn}  C  int  (T>ri(c))  which  tends  to  a  boundary  point  of  D„(c).  tor  x2  c 

j(I1,i2)=  sup  [<t,x2>-c(x1,t)]1  (3.2) 


be  the  Legendre-Fenchel  transform  of  c(xi,  t). 


theorem  3.1  (Ellis).  IfV.Ac)  lias  •  ,n" 

i=d  and  c(xi,  l)  is  a  closed,  c.»»ex  *•*»  £*  *  ‘3  o/p r»Wi/«i 

defined  »  ff-  W  “  ’^"/"^‘JZupper  bound  (7.1,1  /or  all  closed  sets  C  o/K'1’ 
measures  {t/„(xini  )/  sat,s£  p  if  c(x\  t)  is  differentiable  on 

ijsrjssmt  txzztz  — 
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{vn{zln,  .)}  satisfies  the  lower  bound  (1.2)  for  all  open  sets  G  oflZ d'  with  proper 

The  next  lemma  due  to  Dinwoodie  and  Zabell  (1992),  (see  Lemma  .  ) 

in  their  paper),  provides  a  simple  sufficient  condition  for  the  function  J(n,  x3) 
defined  by  (3.2)  to  be  lower  semi-continuous  m  (an,  12)  Note  that  Theorem  3. 
in  conjunction  with  Lemma  3.2,  provides  sufficient  conditions  for  a  sequence 
of  probability  transition  functions  on  an  Euclidean  space,  to  satisfy  the  LDP 
continuity  condition. 

LEMMA  3.2.  Let  {ii„}  be  a  sequence  and  {xi}  m  fit  be  such  that  xln  -*  xx. 

If  for  every  t  G  'R?'1 ,  there  exits  a  sequence  tn  —*  l  such  that 

limsup  c(xin,  tn)  <  c(*l*  *)  .  .  .(3.3) 

n 

then  the  function  J(xit  x2)  defined  by  (3.2)  is  lower  semi-continuous  in  (xi,  x2). 

We  will  now  present  a  number  of  examples  where  the  LDP  continuity  con- 
dition  holds.  In  Examples  3.3,  3.4,  3.5  and  3.6  below,  V„  is  the  sum  of  "  l  td- 

random  variables  X, . X„.  Futthetmote,  the  common  ^‘nbu‘10” f' 

X-'s  is  indexed  by  aparameter  9  6  O,.  Hence  the  function  defined  by  3d),  is  in 
dependent  of  n,  but  depends  on  the  parameter  9  and  therefore 
bv  cffl  t)  In  all  of  the  four  examples,  it  is  easy  to  verify  that ,c{9  0  “  »  Junction 
oft  is  closed,  convex  and  steep,  for  fixed  6  G  Gi-  Clearly.  c(0,  t)  is  continuous  in 
9  for  each  I  in  these  examples  and  therefore  condition  (3.3)  is  trivia  ly  satisfied^ 
Let  the  distribution  of  the  sample  mean  X„  -  Y„/n  be  given  y  »(  ,  ) 
it  follows  from  Theorem  3.1  and  Lemma  3.2,  that  the  sequence  p 

f  .•  /  /a  \  Ac  a,\  jn  all  of  the  four  examples,  satisfies  the 

transition  functions  (i/n(v,  ),  r tW  We 

LDP  continuity  condition  with  rate  function  J{9,  2)  -  sUPieRl  y  '  , 

will  omit  details  and  present  only  the  distribution  the  function  c(9t  t)  and 

m3  variables  are  dehned  to  be 

degene^e  wSn3* 'is  a  boundary  point  of  Cl,.  In  (3.8)  and  elsewhere  in  this 

paper  we  let  0  log  5  =  0. 

Example  3.3.  Let  X„...,X.  be  i.i.d.  normal  with  mean  9,  and  variance 
9,.  If  we  let  9  =  (8, ,  9,)  then  9  €  0,  =  (-oo,  co)  x  [0,  oo).  It  is  easy  to  verity 

that  .  .  , 

m  _  a  4  _L  ifl-#2  -OO  <  t  <  OO,  • 


c(0,t)  =  elt  +  -02t2,  oo  <  t  <  oo, 


for  9  G  fti-  The^equence  of  probability  distributions  of  the  sample  means  {Xn} 
satisfies  the  LDP  continuity  condition  with  rate  function 


...(3.5) 
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0  if  2  = 

oo  otherwise 


•■(3.6) 


for  02  >  0.  <  ^  <  °°;  an<* 

J(0,  *)  =  ( 

when  02  =  0  anc*  ~°°  <  01  <  °°' 

„  .  »  *  y  x„  be  i  i.d.  Bernoulli  with  mean  9,  9  E  = 

Example  3.4.  Let  De  I1U 

[0,  1].  The  function  c(9 ,  t)  is  given  by 


c(0,  t)  =  log  [0exp(<)  +  (1  -  0)L 


-oo  <  t  <  oo. 


...(3.7) 


for  9  €  n, .  The  sequence  of  probability  distributions  of  the  sample  means  <X„) 
satisfies  the  LDP  continuity  condition  with  rate  function 

...(3.8) 


J(M)  =  { 


..  2  ,  n-rllon- — —  if  o  <  r  <  1 

2log-  +  (!  *)  lo5  ^  _  0) 


oo 


otherwise, 

1 


for0€fti-  .  \ _ 

Example  3.5.  Let  Xi,.-.,-Xn  be  i.i.d.  Mx)  -  ^ar(a) 

,  >  0,  o  >  0  and  9  €  0,  =  [0,  oo).  Then  for  any  9  €  fit,  ».  have 


c(«.  <)  =  { 


—a  log(l  —  9t)  it  t  <1/0 


oo  otherwise. 


exp (-x/0)x^° 


...(3.9) 


The  sequence  of  probabiiit,  distributions  of  the  sample  means  {X.}  satisfies 
the  LDP  continuity  condition  with  rate  functi 


■a 


J{0,  z) 

for  each  0  >  0;  and  ;  . 

_  JO  if  z  =  0 

>7(0,  z)  —  ^  QQ  otherwise 


-  [z  +  a  0(log(a  0)  —  1  —  log(z))l  if2>0, 

0  otherwise, 

oo 


...(3.10) 


,  , _ v/9  and  9  =  2  the  distribution  of  n  Xn  is  X2(n)- 

Thus'if' yX  OV/-J  <**■ *>“  LDP  with  proper  ra“  funct'°" 

f  i  [z  -  1  -  log(z)]  ifz>0  ...(3.11) 

J(z)  =  <  2  qo  otherwise. 

example  3.6  Let  X, . X.  be  i.i.d.  /.(*)  =  «  "  <*». 1  5  P  a"d 

j  e  n,  =  (0.  oo).  The  function  c(0, 1)  is  given  by 

„+)og(9)-log(«  +  t)  if«  +  l>0  ...(3,12) 

otherwise. 


c(9 


OO 
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Then  (X„)  satisfies  the  LDP  continuity  condition  with  rate  function 

f  6{n  -  z)  -  log(0(M  -  r))  "  1  lf  f  <  ? 

J(0,  *)  =  {  oo  otherwise, 


■  -  (3.13) 


for  6  >  0.  .  ..  Q992)  considered  Example  3.6  re- 

REMARK  3.7.  Dinwoodie  and 1  Zab, dl  «»)  >  Q  and  showed  thal  the 

stricting  the  parameter  space  1  1  usjng  a  sufficient  condi- 

exponential  continuity  condition  is  satisfied  {  "] d  ESxample  3.3  of  their 

tion,  which  they  attribute  »  “  for  the  full 

S2£  i„WExCampl«  3  1  even  though  the  exponential  continuity  condition  holds. 

4.  Statistical  Applications 

dpplicCon  4.1.  LW  for  *onc‘nir*l  Z 'Z 

seen  that  the  LDP  continuity  condition  .s  satrf. i  P  distribulion5. 

tions  of  averages  of  i.U  random  vambte  thlt  Theorem  2.3  can  be 

In  this  section,  we  will  ,  statistical  distributions  derived  from 

used  to  show  that  the  LDP  holds  for  “  Example  U.  we  show 

the  basic  statistical  distributions.  o  P  identify  its  rate  function, 

that  the  noncentral  .-distributions .obeys  se>eta,  authors 

The  rate  function  for  the  centra  is  ^  el  ol  (1972)  and  more  recently 

l“(mTaKgh  (1989)  fir  the 'noncentral  .-distributions  using  differ- 
ent  methods. 

T  .  y  Xbeiid.  normal  with  mean  6  v^T*an<^ 
Example  4.1.  Let  Xi q2  (Xi  -X)  /(n~  l) 

1.  UtX=E?.i  Xi/n  be  the  sample  me^  ^S  -E.,1^  ^  are 

be  the  sample  variance.  Let  fii  —  (0,  )  .  t  -  X/\/SZ  be  the 

topologically  complete  and  seP-bU  7Z£*^**>  -  » 

/i(s)  =  ”  1  “  (41) 

for  s  €  fii-  Let  *'•»(*./)  bc  ^ 

s  >  0.  Since  «/„(*,  •)  is  just  the  normal  distnbu  ic  satisfies  the 

variance  1/(4  by  Example  3.3  we  1 ba-  that  {„.(,.  ), 

LDP  continuity  condition  with  rate  fun 


«(f-9/>/g£ 
J«(s>  0  —  2 


-oo<t<oo 


--•(4.2) 
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for  n-  (bTs  rz  ety2f  t: 

JoTntTsuLion  of  (S’.  T„)  satisfies  the  LDP  with  proper  rate  function 

s>0,  -oo<t<co.  («) 

Furthermore,  the  marginal  distribution  of  obeys  the  LDP  with  proper  rate 
function 

/••(!)  =  inf  [lr(s)  +  d*(s,  t)]  ...(4.4) 


=  inf  ((s  -  1  -  log(s))/2  +  (In/!  -  «)’/ 2] 

J>01 


which  after  simplification  reduces  to 


I2(t)=  ~{92  -  ste  -  2\og(s)},  -ooCiCoo, 
2 


...(4.5) 


where  sis  the  positive  root  of  the  quadratic  equation  s2(l  4- 12) -si»  1  ». 

Thus  for  any  measurable  subset  A  of  the  real  hue  we  have 
_/HA«)  <  lim  inf  1  log  Pr(T„  €  A)  <  Itasup  i  log  Pr(T„  €  A)  < 

The  above  inequality  (4.6)  was  established  by  other  author .  an- 

alytical  calculations,  when  A  is  an  interval  of  the  form  A  -  (t,  oo), 

Applied  4.2.  Pnofslrey 

known  as  the  “bootstrap”  introduced  by  Eton £  PmePlhod 
in  statistical  methodology  in  recen  years.  probability 

is  as  follows:  Let  E  be  a  Polish  space  and  fi,  bethe  cla^^^P  ^  ^ 

measures  defined  on  the  collection  o  t  m  p)  is  a  Polish  space, 

topology  of  weak  convergence^  J”  JJj  Deusdle,  L'stroock  (1989),  page 
where  p  is  the  Uvy-  Proh^  '  ^  ^  sample  of  „  u.d.  observations 

64.  Fix  P  G  fii-  Let  X  -  (Ai» .  •  • ,  •  5(B)  is  simply 

proper^ate^nction  that  ^depends*  orT^e  Kullback-Leibler  number.  For  Q. 
pe  fiu  the  Kullback-Leibler  number  is  defined  as 

f  J-,logqdP  ifO«P  ...(4.7) 

=  |  oo  otherwise 


K{Q ,  P) 
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„here  ,  is  the  Radon-Nikodyn,  derivative  ^with^peet  to  P. 

The  next  theorem  is  the  mam  result  of  this  section 

THEOREM  4  2.  Ui  0,  4e  the  class  of  oil  pMUUy  ^ 

X i  ,  —  •  *  ^ n  *  ...  *  nf  the  bootsiTOip  sample  X\  i 

U,  P„  4.  <*»  '">?•"“'  m“S  '  LDP  i„  ll(  meat  iopcfejv  «itt  prapec 
joint  distribution  of  (P„,  '  »)  £>oeJ's  lrt 
rate  function 

K{{Q\,Q2),  P)  =  [K(Q2,Q\)  + ^{Qi,  P)]-  ^ 

PR002  3VLC  Ta  -o'  betht d^of  aU^  on  ^  T,hen 

is? 

fjLranl!» 

satisfy  the  LDP  continuity  condition  l  2  6  and  Theorem  2.3  we 

with  proper  rate  function 


K((Qi.  Q2),  P) 


=  [/i(QO  +  JWt.M  u 

=  [K(Qi,P)  +  A'(Q2,<3i 
=  (K(Q2,Qi)  +  K'(Qi.P)]- 


..(4-9) 


TWTrn 

t“SlarLem-  «”  U  similar  to  the  proof  of  Lemma  2.3  (a,  m 
Groeneboom  ci  al  (1979). 

lemma  4.3.  UI  M,  4e  .  -I-  -‘Vo.  -  *  1  2  °- 

Then  the  set 

M,  =  {Q€flf.  A'(O.R)<L  M«”“  fi€M,) 
is  a/so  compact  in  the  weak  topology. 

e/^ySin^l  set  V  =  V', 

in  £  such  that  _/l/e.  .  <  for  all  R€Ml 


...(4-11) 
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Let  Q  €  M2  and  R  €  Mi  be  such  that  K{Q,  R)  <  L-  Let  g  be  the  Radon- 
Nikodym  derivative  of  Q  with  respect  to  R.  Since  x  log(x)  >  1/e  or  x  >  , 


Nikodym 
we  have 


f  9  l°g  9  dR  £  L  +  1/e. 

J{g>o} 


...(4.12) 


Q(VC)  =  [  gdR+  \ 

J Vcn{g<a]  J  V  e 


n{?>o) 


<  aR(Ve)  +  ^/{s>o}fflogffd/i 

<  2  +  e/2  =  e. 


...(4.13) 


Therefore  we  have 

Q(VC)  <  e  for  all  Q  €  M2,  ...(4.14) 

which  implies  that  M2  is  compact.  This  completes  the  proof  of  the  lemma_  □ 

By  Theorem  2.3  we  can  also  conclude  that  the  marginal  distribution  ot  rn 
satisfies  the  LDP  with  proper  rate  function 

A'*(Q,  P)  =  inf  [K(Q,  R)  +  K{R,  P)).  ■  •  •  (415) 

It  is  interesting  to  note  that  K(Q,  P)  >  **(Q.  P)  for  all  Q,  P  €  flt ,(eq^ 
holds  iff  p  =  Q).  Thus  the  rate  function  of  the  ordinary  empirical  measure 

Pn  is  always  greater  than  the  jate  function  of  the  marginal  distribution  of  the 

bootstrap  empirical  measure  Pn- 

Application  4.3.  Parametric  Bootstrap.  Let  Xu  •  •  •  wh«e 

tors  with  distribution  given  by  r,e  indexed  by  the  parameter  9  €  ft  i  C  K  -  where 
Q,  is  a  Polish  space.  Let  9n  =  r„(Xi,...,  Xn)  belonging  to  Di  be  an  est  mate 
of  9.  In  parametric  bootstrap  method,  given  9n  =  zlt  the  bootstrap  sample  is  a 
sequence  of  i.i.d.  observations  XJ.-.-.X;  from  ^  Let  9n  =  Tn{Xu.^,A  ) 
be  the  estimate  of  9  based  on  the  bootstrap  sample  (X[ , . .  • ,  Xn).  Suppose  tha 
9n  satisfies  the  LDP  continuity  condition  with  rate  function  J(0,  z0*  Zl  G  ' 
Then  by  Theorem  2.3  it  follows  that  the  joint  distribution  of  (9n,  9n)  satishes 
the  LDP  with  proper  rate  function 

I(9,(zi,z2))  =  [J(0'Zi)  +  J(zi’z2)l  ...(4.16) 

Also,  the  marginal  distribution  of  9n  satisfies  the  LDP  with  proper  rate  function 
I  to  — \  —  inff/ffl  z}  4-  J(z.  Z->)\.  ...(4.17) 


I2{9,  z2)  -  inf  [j(9,  z)  +  J{z,  z2)]. 


It  is  interesting  to  note  that  J(0,  z)  >  h(0 ,  z)  for  all  (9,  z),  that  is  the  rate 
function  of  the  marginal  distribution  of  0n  is  always  less  than  the  rate  function 
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Of  the  distribution  of  9n.  We  now  preset  a ^^[hfrlteTunSoJis  J{0,  *0’s  are 
and  4.5  it  is  easy  to  check  that  for  each  fixed  B  the  Qus  in  (*,  *1). 

T  t  Y  v  be  i  i.d.  Bernoulli  with  parameter  5  €  Qi - 

Example  4.4  Let  Xi-  -*"  k  mean.  Let  be  u.d. 

[0,  1].  Let  9n  =  Xn  where  »  Let  gin  -  X'  be  the  estimate  of  9 

Bernoulli  with  parameter  z\  ”  J.\  jt  f0u0ws  then  from  Example  3.4 

based  on  the  bootstrap  sample  (A, ,  ■  •  •  ,*nJ-  -  satisfies  the  LDP  with 

a„d  Theorem  2.3  .ha.  .he  join.  distribution  of  (»„ ,  W 

proper  rate  function  . 

. . .  „  w  __  r  j/a  ■*.'1  j.  .11  z-t .  •  •  •  (  •  ' 


««,  («..«))  =  (A*.  *.)  +  A*..«)l 


\  * 

.  ,,  Th-  marginal  distribution  of  also  satisfies 

where  J(»,  »)  is  8lven  bH  f3'8* .  Th' m  6 
the  LDP  with  proper  rate  function 

•  f  [Tf4  Al  7 It  2,11.  ...(4.19) 


_ty  y  be  i  i  d  exponential  with  mean  1/9,  that  is 

Example  4.5.  Let  Xi,  •  •  -  .  Xn  •  ,  p •  6  «i  =  (°-  °°)-  Let 

^  the  joint  distribution  of  («..  «  **»* 

the  LDP  with  proper  rate  function 

f  -2-i°5(*4«)!  V1 

I{9,  (zi,  22))  -  |  oo  ...(4.20) 

for  9  >  0.  The  marginal  distribution  of  9n  also  obeys  the  LDP  with  proper  rate 
function 

_  *1  •  f*  _  II  4  1  A1  \ 


9,  *2)  =  ( 


P»*((l +  .,>/*) -•*•*«  -MW 


for  9  >  0. 
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Appendix 


The  following  result,  known  as  the  contraction  principle,  is  a  very  useful 
device  to  deduce  the  LDP  for  a  sequence  of  measures  induced  by  a  continuous 
function  from  a  sequence  of  measures  which  are  known  to  obey  the  LDP.  An 
extension  of  the  contraction  principle  for  measurable  functions  h  can  be  found 

in  Puhalskii  (1991).  .  ,  0 

Contraction  principle.  Let  h  :  fi  -  O’  be  a  contmuous  function  where  0 

and  n-  are  two  Polish  spaces.  Let  1  be  a  proper  rate  function  on  0.  Then 

/•(„)  =  inf/r.  Afrl-yl  /(*)  is  a  Pr°Per  rate  functlon  on  SuPP°se  taal 

is  a  sequence  of  probability  measures  on  Q  which  obeys  the  LDP  with  proper 
rate  function  I.  £hen  the  sequence  =  *.  If'}  obeys  the  LDP  with  proper 

rate  function  /’. 
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Abstract 

In  this  paper  we  derive  the  exact  Bahadur  slope  of  the  /-statistic  based  on  a  random  sample  from  a  contaminated 
normal  distribution,  using  some  results  in  large  deviation  theory.  We  also  present  a  table  of  exact  Bahadur  slopes  at 
various  alternatives  at  several  levels  of  contamination. 

AMS  classification:  primary  62F03;  62F05;  62G35;  secondary  60F10 
Keywords:  Bahadur  slope;  Large  deviations;  Robustness;  Tukey  model 


1.  Introduction 

To  study  robustness  of  standard  tests  of  location  in  a  normal  model,  one  generally  studies  their  proper¬ 
ties  under  the  Tukey  model  (see  Tukey,  1960)  of  contaminated  normal  alternatives,  namely,  the  probability 
distributions  P^e,(r)  with  probability  density  function  (pdf) 

/(£,fU)(*)  =  (i  -  £)<£(*; M) +  (l) 

for  0  <  e  <  1,  where  <f>(x;  8,<r)  is  the  pdf  of  a  normal  distribution  with  mean  6  and  variance  a2. 

Suppose  that  X\,Xi,...,X„  is  a  random  sample  from  /(£,<?, ^(x)  and  that  we  wish  to  test  the  null  hypothesis 
d_=  0  versus  8  >  0,  using  the  /-statistic  T„  =  y/nX„/S„,  where  Xn  =  (l/«)£”=i^  and  S2  =  (l/«)E"=i(^  “ 
X„)2.  The  robustness  of  this  /-test  as  measured  by  Pitman  efficiency  has  been  studied  in  the  famous  Princeton 
study  by  Andrews  et  al.  (1972).  In  this  paper  we  derive  the  large  deviation  rate  function  of  Tn  under  P(C.o,a) 
which  allows  us  to  obtain  the  exact  Bahadur  slope  of  the  r-test  under  a  general  alternative  P(C,g,a),  8  >  0. 
Following  the  practice  of  other  authors,  we  set  a  equal  to  3,  and  give  the  exact  Bahadur  slopes  for  various 
values  of  e  and  8  in  Table  1.  This  table  gives  an  indication  of  the  region  of  robustness  of  the  /-test  as 
measured  by  the  exact  Bahadur  slope.  The  robustness  of  the  /-test,  in  the  sense  of  Bahadur  efficiency,  is 


•  Corresponding  author.  Partially  supported  by  the  US  Army  research  office  grant  no.  DAAH04-96- 1-0070. 
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gleaned  by  comparing  the  slope  at  the  contaminated  distribution  P(t_e,z)  with  the  slope  at  the  uncontaminated 
distribution  P{ 0,9,3).  As  expected,  Table  1  shows  that  there  is  adequate  robustness  in  a  region  of  small  values 
of  £.  Furthermore,  for  a  fixed  9  the  slope  is  a  decreasing  function  of  £  and  for  a  fixed  £  the  slope  is  an 
increasing  function  of  9. 

The  exact  distribution  of  T2  under  P(z,0,a)  has  been  derived  in  Lee  and  Gurland  (1977).  We  will  derive 
the  large  deviation  rate  function  of  T„  under  P(C,o,<t)  and  the  exact  Bahadur  slope  under  the  alternative  Pi,  n 
m  Section  2.  (  '  ] 


2.  Large  deviation  rates  and  Bahadur  slopes 


We  refer  to  the  excellent  monograph  of  Varadhan  (1984)  for  an  introduction  to  the  theory  of  large  deviations 
and  to  the  monograph  of  Bahadur  (1971)  for  the  concept  of  Bahadur  slopes  and  efficiencies.  One  needs 
a  strong  law  under  the  alternative  and  a  large  deviation  result  under  the  null  hypothesis  to  obtain  the  exact 
Bahadur  slope.  It  is  easy  to  see  from  the  usual  strong  law  of  large  numbers  that 


T„_ 

y/a 


m(e,  9,  <j) 


9 

\/(l  -£)  +  £<r2’ 


(2) 


with  probability  one  under  P(c,e,a)-  We  need  to  obtain  a  result  of  the  form 


n  log'W) 


(3) 


where  y(m)  is  continuous  in  m,  which  is  usually  referred  to  as  the  large  deviation  rate  function  of  T„.  It  then 
follows  that  the  exact  Bahadur  slope  of  T„  equals 


c(£,  9,  <7)  =  2y(m(e,  9,  a)).  (4) 

We  now  proceed  with  the  derivation  of  y(m).  Note  that  the  event  {T2/n^m2}  is  equal  to  the  event  {W„^0} 
where  Wn  is  the  quadratic  form  W„  =  X'AX/n  with  A=J-  nal,  a  =  m2/ (1  +  m2),  /  is  the  identity  matrix 
and  J  is  a  matrix  of  ones.  Since  the  distribution  of  Tn  is  symmetric  under  P(£i( U),  we  have 


0). 


(5) 


(From  here  onwards,  P  without  a  suffix  corresponds  to  the  probability  under  P(c,o,a)-)  The  logarithm  of  the 
probability  in  (5)  can  be  approximated  (see  (19)  and  (20)  below)  by  using  the  moment  generating  function 
(mgf)  of  Wn  which  is  given  by 


Mn(t)  =  E[txp(tWn)]  =  g  (”)(1  -  £)*< 


Xn-k) 


I  -  -AtA 

n 


-1/2 


(6) 


k  ”7* 

where  Ak  =  diag(l,...,  l,cr2,...,<72).  Let  p  =  k/n  and  q  =  1  -  p.  Using  a  matrix  determinant  formula  (see 
the  appendix),  we  can  show  that 


Mnk{t)  = 


I  -  -AkA 
n 


-1/2 


=  ( fi(t))~np‘2(  f->(t))~nq/2  ( eLILUMLI  + 

v  mm)  ) 


-1/2 


(7) 
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where  /,(/)  =  1  +  2 at,  f2(t)  =  1  +  2atc\  f3(t)  =  1  -  2/(1  -  a)  and  f4(t)  =  1  -  2r<72(l  -  a).  Thus,  the  mgf 
of  JV„  is  given  by- 

Mn(0  =  ^(^(\-z)ke(n~k)Mnk(t)  for  t*(p)  <  t  <  t*(p),  (g) 

where  t*(p),  t*(p)  are  the  roots  of  the  quadratic  equation  pf2{t)f{t)  +  qf{t)f4(t)  =  0. 

From  the  above  formula  for  the  mgf  M„(t),  we  can  conclude  that  the  distribution  of  W„  is  a  mixture 
distribution.  More  precisely,  let  AT  be  a  binomial  random  variable  with  parameters  n  and  (1  —  e).  Given 
K  =  k,  let  Unk  be  a  random  variable  with  mgf  given  by  Mnk.  From  (8)  we  can  see  that  Wn  is  equal  in  distri¬ 
bution  to  UnK.  This  observation  coupled  with  a  theorem  of  Varadhan,  see  Theorem  2.2  in  Chaganty  (1993), 
is  useful  to  derive  the  large  deviation  rate  function  for  the  random  variable  W„.  Theorem  1  below  shows  that 
the  conditions  in  Varadhan’s  theorem  are  indeed  satisfied  in  our  problem. 

Theorem  1.  Let  K  be  a  binomial  random  variable  with  parameters  n  and  (1 
Unkn  be  a  random  variable  with  mgf,  Mnkft),  defined  in  (7).  If  p„~* p  then 

Fn{pn )  =  -  log P( Unkn  ^  0)  -*•  F{p)  as  w  — ♦  co, 
n 

where  F(p)  =  [plog/,(r*(^»  +  ?  log /2(f  *(/>))],.  q  =  \  -  p. 


-  £).  Given  K  =  k„  =  npn,  let 

(9) 


Proof.  Upper  bound-.  By  Chebyshev’s  inequality  it  follows  that 

limsup  -  log  P(Unkll  20)  <  lim  -  log  Mnk„(t) 
n  n  »  n 

=  ~\  blog/i(0  +  ?log/2(0] 

for  any  0  <  t  <  t*(p).  Hence, 

lim  sup  F„(pn )  =  lim  sup  -  log  P(  Unk„  5=  0) 


(10) 


<  0</<r(p)  — 2  ^,o8/i(0  +  ?log/2(0] 

~F(P)-  (11) 

Lower  bound :  Let  G„k,  denote  the  distribution  function  of  Unkn.  Let  us  introduce  another  random  variable 
Vn  with  the  conjugate  distribution  function  given  by 


dHn,fx)  =  &Gnkfx) 

where  tn  =  t*(p)(  1  -  (1/m)).  Now  for  any  <5  >  0  we  have 


(12) 


0) 


-f 

Jo 


d  Gnk„(x)  =  M„kft 


< n)  [  exp (~xt„ 
Jo 


[fib 

^Mnkftn)  /  exp(-xtn)dH„,fx) 

Jo 

^  M„k,Xt„)exp(-n3tn)P(0^  V„  s$n<5). 


(13) 
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Therefore, 

\  log P(U„k,  log MnkStn)  +  \  log P(0  <  V„  < nS).  (14) 

Since  pn~*  p  as  it  follows  from  (7) 

^log Mnk„(t„)  -  [p  log  f\(t*(p))  +  q  log  f2(t*(p))]  =  F(p).  (15) 

n  L 

We  will  now  show  that  the  limiting  distribution  of  V„ln  is  a  translated  gamma  distribution.  To  find  the  limiting 
distribution,  we  first  note  that  the  mgf  of  V„jn  is  given  by  M„(s)  =  M„kXsn )/Mnk„(tn),  where  s„  =  t„  +s/n. 
It  is  easy  to  check  that 


M„(s)  — >  Af(s)  =  exp  (-sc) 


t*(p)-s 


for  s  <  t*(p),  where  c  =  [ap/(  1  +  2 at*(p))  +  aqa2/(  1  +  2 at*(p)a2)].  Thus,  V„/n  converges  in  distribution 
to  V  -  c,  where  V  is  a  Gamma  random  variable  with  shape  parameter  1/2  and  scale  parameter  1  /t*(p). 
Therefore, 


P(0^V„/n^d)—>P(c^V^c  +  d)>0  as  n— >oo.  (17) 

From  (14),  (15)  and  (17)  we  get 

lim  inf  Fn(pn )  =  lim inf  -  log P{Wnk,  > 0)>F(p)  -  6t*(p). 

n  n  ft 

Since  <5  is  arbitrary  we  get  lim  inf nF„(pn)^F(p).  This  completes  the  proof  of  the  theorem.  □ 


We  are  now  in  a  position  to  derive  the  large  deviation  rate  function  y(m)  of  T„ .  From  Theorem  1  we  have, 
F„(Pn)  =  -  logP{Wn^0\K  =  np„)  -  F(p)  (18) 

Ti 

whenever  pn  — +  p .  Note  that 

-  logP(Wn>0)  =  -  log  [  exp(nF„(p))  dp„(p),  (19) 

n  n  J 

where  p„  is  the  distribution  of  K/n.  Since  the  distribution  of  K  is  binomial,  it  is  known  that  the  sequence 
of  probability  measures  {//„}  obeys  the  large  deviation  principle  (see  Varadhan,  1984  for  the  definition)  with 
rate  function 


Kp)  =  P  log  (M 1  -£))  +  ?  log  (?/e)  • 

Using  the  theorem  of  Varadhan,  see  Theorem  2.2  in  Chaganty  (1993),  and  (18)  and  (19)  it  follows  that 

-logP(W„>0)-*  sup  ( F(p)-h(p )).  (20) 

rt  0<  p<\ 

From  (5)  and  (20)  we  get 

^  log  P  -►  -y(m), 

where  y(m)  =  inf0<p<i[-F(p)  +  h(p)]. 
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Table  1 

Slope  of  the  r-statistic  c(t,  8,  a),  for  the  contaminated  normal  model,  when  <7  =  3 


£ 

6 = 0.25 

6  =  0.50 

8=  1.0 

0=1.5 

0  =  2.0 

0  =  2.5 

0  =  3.0 

0.00 

0.06062 

0.22314 

0.69315 

1.17865 

1.60944 

1.98100 

2.30259 

0.05 

0.04487 

0.17380 

0.56738 

0.99565 

1.39154 

1.74207 

2.05046 

0.10 

0.03509 

0.14056 

0.48860 

0.87952 

1.24944 

1.58306 

1.88034 

0.15 

0.02865 

0.11598 

0.42937 

0.79694 

1.14852 

1.46908 

1.75733 

0.25 

0.02090 

0.08422 

0.33264 

0.67161 

1.00633 

1.31239 

1.58918 

The  rate  function  y(m)  can  easily  be  computed  numerically  using  Newton-Raphson  method.  In  Table  1 
we  present  the  exact  Bahadur  slope,  c(e,9,o)  =  2y(m(e,  9, <r)),  for  different  values  of  e  and  9  when  <7  =  3. 
Note  that  a  large  value  of  c(e,9,cr)  indicates  that  the  test  statistic  T„  requires  smaller  sample  size  to  detect 
that  particular  alternative.  The  Bahadur  efficiency  of  the  r-test  with  respect  to  the  competing  nonparametric 
Wilcoxon  test  in  the  Tukey  model  has  recently  been  obtained  in  Chaganty  and  Sethuraman  (1996). 


Remark  1.  It  is  possible  to  derive,  in  a  similar  manner,  the  exact  Bahadur  slope  of  the  /-statistic,  for 
a  random  sample  of  n  observations  with  common  pdf  given  by  f(x)  =  ^mXnt^(x\9,ai),  m  =  1, 
and  Hi  >  0  for  all  I  >1.  In  this  case  the  multinomial  distribution  plays  the  role  of  the  binomial  distribution 
in  the  derivation  of  the  slope.  More  generally,  using  the  results  of  Chaganty  (1993),  we  can  also  establish 
the  large  deviation  principle  for  the  /-statistic  for  this  model. 


Appendix 

In  (7)  we  have  used  the  following  determinant  formula.  Let 

*  («-*) 

„  _  bl  +  cJ  cJ 
[  eJ  dl  +  eJy 

where  b,  c,  d  and  e  are  constants,  and  as  before,  I  is  the  identity  matrix  and  J  is  the  matrix  of  ones.  Then 
we  can  verify  that 

|5|  =  6V-«(l  +  |  +  ^).  (A.,, 

To  obtain  the  simplification  in  Eq.  (7),  we  use  the  above  formula  (A.1)  with  the  substitutions  b  =  /.(/) 
d  =  c  =  -2t/n  and  e  =  -2ta2jn. 
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Abstract:  We  examine  —  * 

the  t-test,  the  Wilcoxon  tes  and  the  s,g  ^  comparing  the  Bahadur 

central  tendency  of  a  distribution  a  « >  •  the  large  deviation 

slopes  in  a  contaminated  norma 1  mode  ■  £  *  for  ^  stand  d  Kst 

principle  (LDP)  and  then  calculate  th '  jnated  normal  distribution. 

statistics  when  the  observations  cotnefr  (hat  the  Wilcoxon 

An  examination  of  tables  o  iehborhood  of  the  null  hypothesis,  even 

alternative  hypothesis. 

Keywords  and  phrases:  Bahadur  slope,  large  deviations,  Pitman  e 
robustness,  Tukey  model.  Wilcoxon  tes 


16.1  Introduction 

on  testing  problems  encountered  in  statistics  is  testing 
One  of  the  most  common  tesimg  f 

Ho-.e  =  0  vs.  H,:«>0 

where  9  is  a  measure  of  central  tende“y'  ^piefrom'Tnormal  distribution 
sumption  that  the  f“™S  “  the  ,.test  is  known  to  be  the  uniformly 

with  unknown  variance.  >«  th‘s  “f  >  hat  have  been  proposed  include  the 

-sssg 

~t:x. - d 
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Huber  (1981)  and  Tiku,  Tan  and  Balakrishnan  (1986),  and  more  recently  by 
DasGupta  (1994).  It  has  been  the  standard  practice  to  examine  the  robustness 
of  these  tests  in  the  famous  Tukey  model  (see  Tukey  (I960)],  which  models  a 
certain  form  of  departure  from  normality.  Under  the  Tukey  model,  the  sample 
consists  of  i.i.d.  observations  from  the  density 

f6(a{: r)  =  (l-O0(z;M)  +  etf(*;0,ff).  t16-1) 

Here  p(x;(9,cr)  denotes  the  probability  density  function  of  a  normal  random 
variable  with  mean  9  and  standard  deviation  a,  and  e  €  (0,1)  represents  the 
level  of  contamination. 

Two  measures  which  are  commonly  used  to  compare  the  large  sample  per¬ 
formance  of  tests  are  Pitman  efficiency  and  Bahadur  efficiency.  In  Andrews 
et  al.  (1972),  Huber  (1981)  and  Lehmann  (1983,  Chapter  5),  robustness  was 
measured  by  Pitman  efficiency,  which  is  obtainable  by  comparing  asymptotic 
efficacies  of  tests.  In  this  paper,  we  measure  the  robustness  of  these  tests  by  Ba¬ 
hadur  efficiency,  which  is  obtainable  by  comparing  Bahadur  slopes.  We  present 
some  tables  showing  the  Bahadur  efficiencies  of  the  Wilcoxon  test  relative  to 
other  three  tests.  From  an  examination  of  these  tables,  it  appears  that  the 
Wilcoxon  test  is  the  best  performer  in  a  neighborhood  of  the  null  hypothe¬ 
sis.  even  under  the  presence  of  moderate  contamination,  but  is  not  the  best 
performer  uniformly  over  the  whole  region  of  the  alternative  hypothesis. 

The  concept  of  Bahadur  slope  can  be  briefly  described  as  follows.  Let 
A'i  Xn  be  i.i.d.,  whose  distribution  depends  on  a  parameter  A  taking  values 
in  a  set  A.  The  parameter  A  can  be  a  vector  like  ( 9 , 6,  a)  as  occurs  in  our  prob¬ 
lem.  Consider  the  problem  of  testing  the  hypothesis  that  A  lies  in  a  subset  A0  of 
A.  For  each  n,  let  Tn  be  a  real  valued  function  of  the  sample  {Ai,  *2, . .  • . 
such  that  large  values  of  Tn  are  significant  for  testing  the  null  hypothesis.  For 

any  A  and  t,  let 

K(t,  A)  =  Pxi.%  <  0  (16-2) 


and 

Gn{t)  =  inf{Fn(£,  A)  :  A  €  Ao}. 


(16.3) 


If  Ac  were  a  singleton,  then  F„(t,  A)  and  C„(t)  are  equal;  otherwise  the  sig¬ 
nificance  probability  of  a  test  based  on  T„  is  obtained  from  G„(t).  In  fact,  the 
level  attained  by  Tn  is 


L„(T„)  =  1  -  Gn(T„). 


(16.4) 


The  rate  at  which  Ln 
discriminating  power 


tends  to  zero  when  a  non-null  A  obtains  is  a  measure  of  the 
of  the  sequence  of  test  statistics  {T„}  in  discriminating 
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.  ,  X ■  see  Bahadur  (1960.  1967,  1971).  The  sequence  of  test  statistics  (T„)  is 
S  to  have  exact  slope  c(X)  when  X  obtains  if 

lim  i  log  ur„)  =  —  CM  a-s-  (16'5) 

n—>oc  xi 

,  ,  lM  nc  r( \\  for  A  e  A\  Ao  that  are  of  interest,  with  larger  values 

MlcMfos'aa  II*  a  JmiI'*  h>P«h«is  '  “  b««r.  ///*/(' 

s~  -  -  — 


Theorem  16.1.1  Suppose  that  for  each  A  €  A  \  Ao, 


lim  Tn  =  b{\)  a.s.  [P\ ] 

n-^o c 


(16.6) 


(16.7) 


where  -oo  <  6(A)  <  oo.  Suppose  that  for  X  €  Ao 

!irn_  i  log  7n(s)  =  -f  (»)  a-s-  l^xl 

/or  each  s  in  an  open  infernal  uiAich  includes  6(A)  and  JW  to  a  r-itiue  ««■ 
tinuous  function  on  1/iat  interval.  Then,  the  exact  slope  o/{T„}  exists /o 
A  6  A  \  Ao  and  equals  c( A)  -  2/(6(A)). 

m  P-lce,  verificatiorfr«^^^^ 
a  strong  law  of  large  num  ers.  ’  p  _  sums  cf  i.i.d.  random 

variatdes^one  ^can  ^rse  Cramps  -  Chernoff-s  "J-^cIple«e?y 

» -ss  s;  srsr  — — - 

state  the  main  'heore“t°^“'S^’  id  t0  ^  a  mte  Junction  if  it  is  lower  semi- 
continuous  For  any  subset  A,  we  write  1(A)  =  MVW  ^ 

—  m  i-  w—  "« 

if  the  following  conditions  are  satisfied. 

(16.8) 
(16.9) 


lim  sup  -  log  t*n(C)  <  -AC) 


n 

1 


liminf  ilogMn(G)  >  -KG) 

for  all  closed  sets  C  and  for  a.Upen  sets  C  r.pectWel, 

r  r  (Sb:  ‘  “ 
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Let  {Tn}  be  a  sequence  of  Rk  valued  random  variables  with  distribution 
given  by  the  sequence  of  probability  measures  {pn}.  Define 

cn(t)  =  -  log  E[exp(<  t,  nTn  >)]•  (16.10) 

n 

Suppose  that  limn  cn(t)  exists  and  is  equal  to  c(t),  for  all  t  €  Hk,  where  we  allow 
both  c„(0  and  c(t )  to  take  the  value  +oo.  Let  V{c) •=  {t  £  R  :  c(t)  <  oo}. 
The  function  c(t )  :Rk  —>  Ris  said  to  be  closed  if  {t  :  c(t)  <  a)  is  closed  for 
each  real  a.  This  is  equivalent  to  c(t)  being  lower  semi-continuous.  If  c(t)  is 
differentiable  on  the  interior  of  V(c),  then  we  call  c(t)  steep  if  ||  grad(t„))  ||-+  oo 
for  any  sequence  {in}  C  int  ( V{c ))  which  tends  to  a  boundary  point  of  V{c). 

Let 

s' 

I(s )  =  sup  (<  t,  s  >  -  c(t)],  (16.11) 

tenk 

for  s  €  Rk  be  the  Legendre-Fenchel  transform  of  c(t).  The  main  theorem  of 
Ellis  (1984)  can  then  be  stated  as  follows. 

Theorem  16.1.2  (Ellis):  If  V(c)  has  a  nonempty  interior  containing  the 
point  t=0  and  c{t)  is  a  closed  convex  function  of  Rk ,  then  the  function  I(s) 
defined  in  (16.11)  is  a  proper  rate  function  on  Rk  and  the  sequence  of  probabil¬ 
ity  measures  {/tn}  satisfies  the  upper  bound  (16.8)  for  all  closed  sets  C  of  R 
with  proper  rate  function  /(«).  Furthermore,  if  c(t)  is  differentiable  on  all  of 
interior  ofV(c)  and  is  steep,  then  the  sequence  of  probability  measures  {/in}  sat¬ 
isfies  the  lower  bound  (16.9)  for  all  open  sets  G  ofRk  with  proper  rate  function 

I{s). 
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Let  Xi  X2 ,...,Xn  be  a  random  sample  from  the  distribution  (16.1).  In  this 
section’  we’ will  first  establish  the  LDP  for  the  commonly  used  test  statistics 
for  tasting  the  hypothesis  H,  :  9  =  0  vs.  H,  :  »  >  0.  The  LDP  results  for  the 
Wilroxon  and  (-statistics  are  new.  We  do  this  even  though  the  full  force  of  the 


LDP  is  not  required  to  calculate  Bahadur  slopes. 

We  will  consider  four  test  statistics— the  mean  test,  the  t-test,  the  Wilcoxon 
test,  and  the  sign  test— the  last  two  of  which  are  nonparametric  tests. 


Mean  test.  The  test  statistic  (under  the  assumption^  that  the  population 
variance  is  known)  for  the  mean  test  is  Ti„  =  X„  =  -  £  *«  •  Under  the  nul1 
hypothesis  Hq  :  6  =  0,  we  have 
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c\n{t)  =  -  log  £(exp(n*Tm)] 

n 

t2  [  .  ftV-D 

=  L  +  log  (1  -  e)  +  €  exp  ^ - 2 

=  ci(t)  (say),  -oo  <  t  <  co, 


(16.12) 


.  F  it  iQ  pasv  to  verify  that  the  function  Ci(t)  is  a  closed 
"n  the  real  line,  and  satisfies  the  hypothesis  of  Theorem  >6.1.2. 
Therefore,  T,»  obeys  the  LDP  with  proper  rate  fnnetton 


h(s)  =  sup  Is  t  —  Cl  (t)] 

1V  '  -oo<t<oo 

=  Sts-Cl(ts) 


(16.13) 


where  ts  satisfies  the  equation  s  -  cx(ts)>  which  simplifi 

eW  (1  -  e)(»  - 1.)  +  «  (s  -  t,CT2)  =  0.  (16'14) 

The  above  equation  (16.14)  can  be  soived  numerically  using  the  Newton-Raphson 
method. 

Sign  test.  Let  V„-  1  if  X.  >  0  and  V,  =  0  if  X.  <  0.  The  nonparametric  sign 
test  is  based  on  the  statistic  T2„  =  \  X>  Note  that  the  random  vanab.es 

T2n  obeys  the  LDP  with  proper  rate  function  giv  y 

f  log(2)  +  slog(s)  +  (l-4)log(l-»).  ifOfi.Sl  (16.15) 

h(s)  =  l  1  OO  otherw.se. 

*  i  y  l  IX  I  in  increasing  order  and  assign  ranks. 

Wilcoxon  test.  Arrange  \X^-A  nl  .  The  Wilcoxon  statistic  is 

Let  U i  be  the  sign  of  Xj  where  |Aj| 


Let  Ux  be  the  sign 
equivalent  to 


r  -  1  -  T  i  Ui . 

3n  —  .  1  \  Z-* 


(16.16) 


J3n"  n(n  +  l)  U 

The  following  theorem  generalizes  a  result  of  Klotz  (1965)  and  gives  the  LDP 
for  the  Wilcoxon  statistic. 
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r 


Theorem  16.2.1  Let  Eni  denote  the  expected  value  of  the  ith  smallest  order 
statistic  from  a  sample  ofn  observations  with  distribution  function  G  on  (0,  oo) 
satisfying  f^xdG(x)  <  oo.  Let  Ux  be  independent  such  that  P{UX  =  ±)  =  1/2 

1  xn  . 

for  i  =  1, 2, . . . ,  n  and  let  Sn  =  -  ,  Ux.  Then,  {5n}  obeys  the  LDP  with 

ni=\ 

proper  rate  function  given  by 


I(s)  =  sup 

t 


[CO 

st—  log(cosh  (xt))dG(x) 

Jo  J 


(16.17) 


Proof.  From  Theorem  1  of  Hoeffding  (1953),  we  have  for  each  t  6  71 


C3n{t)  =  -  log  E{exp{ntSn)} 


=  -  V  log(cosh(t  Eni)\ 

n  £ i 

rOO 

-»  /  log(cosh(t  z))dG(z)  =  c3(0  (say) 

Jo 


(16.18) 


as  n  ->  oo.  It  is  easy  to  check  that  the  function  c^it )  satisfies  the  conditions  of 
Theorem  16.1.2.  Theorem  16.2.1  now  follows  from  Theorem  16.1.2.  ■ 


Note  that  under  the  null  hypothesis  0  =  0,  the  random  variables  U\,...,Un  in 
(16.16)  are  i.i.d.  symmetric  Bernoulli.  Also  in  (16.16),  Eni  =  is  expected 
value  of  the  ith  smallest  order  statistic  from  a  random  sample  of  n  observations 
distributed  uniformly  on  (0,  1).  Therefore,  conditions  of  Theorem  16.2.1  apply 
with  G  as  the  uniform  cdf  on  (0,  1).  Let 

c3(t)  =  f  log(cosh(t  i)]  dx,  —  oo  <  t  <  oo.  (16.19) 

Jo 

Using  Theorem  16.2.1,  we  can  conclude  that  T^n  obeys  the  LDP  with  proper 
rate  function 

h(s)  =  sup  [s  t  c3(t)] 

t 

=  sts-c2(t3)  (16.20) 

where  ta  is  the  solution  of  the  equation 
s  =  c'3(t) 

=  J  i  tanh {tx)dx 

1  7T“  log(l  +  exp(-2f))  ,  1  Jr,  Uk+  iexp(-2tfe) 

=  2"  247^  + - 1 -  fc2 

(16.21) 


ID?  for  Common  Statistical  Tests 

integration  by  parts,  we  can  rewrite  (  -  ) 

J3(s)  =  2s  ts  -  log(cosh(ta))  v 

where  t.  is  the  solution  of  the  equation  (16.21). 

,  ,  T  and  S’  =  i  ±(Xi  -  X„)2  be  the  mean  and  variance  of  the 

t-test.  Let  A„  and  „  „  £5  =  The  LDP  for  the 

sample.  The  (-statistic  is  16.1.2.  However,  we  can  establish 

(-statistic  does  not  follow  deviation  theorem  of  Cha- 

the  LDP  for  the  (-statistic  using  parametersn  and  (1-0- 

ganty  (1997).  Let  K, ,  be ^'strl“ed  “  ®  be  u.d.  W(0,  c),  independent  of 

Let  and  let  Sf„  and  S’„  be  the  sampte 

tia^L  Of  a  sample  of  n  observations  from  Z  and  V ,  respective.,.  Note 
T4n  is  equal  in  distribution  to  the  statistic 


Jfn  3zn Pn  T  ^  n’  Vn^~rn> 

where  P„  =  *./»■ 

rate  function  hit*)  =  2  /-  an^  K"  J  h  LDP  with  proper  rate  functions 

tf  1 ;  i'-t  £  Mu)  =  C2  - 1  -  »o6(u//)i  A  -P— 

Conditional  on  Pn  =  P.  the  Lynched  Sethu^anllW)  and 

all  independent.  Using  oro  ^  ^  can  see  that  this  conditional  joint  distri- 
Example  3.11  in  Chaganty  (  } )>  .  -n  with  proper  rate  function  given 

bution  obeys  the  LDP  contmui  y  Ch  nty  (1997)  for  the  definition  of  the 

by  h,(z)  +  My) +  hM  +  h4H  Ltitfon  principle  in  that  connection.  It 
LDP  continuity  condition  and  the  thePConditional  distribution  of  T4n 

"""“lot  continuity  condition  in  p  with  proper  rate 
function  given  by 


J{p,  s)  = 


inf 


(z,y,u,v)  :  ^^pu+flv+pd-pXt-y) 


[hi(z)  +  h2(y)  +  h M  +  Mv)]. 


(16.23) 


From  the  LDP  for  binomial It  then  follows 
from  Theo'rem  ^fW^T)  that  T4„  obeys  the  LDP  with  proper  rate 
function  (16.24) 

Ms>  =  o<p<i 1J(p’  S)  +  5W 
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where  J{p.,  s)  is  given  by  (16.23).  The  above  expression'  (16.24)  for  J4(s)  is  not 
convenient  for  computational  purposes.  However,  using  a  different  approach, 
Chaganty  and  Sethuraman  (1997)  have  derived  the  following  equivalent  form 
for  the  rate  function 


/4(a)  =  Qmf  i  l  [plog(l  +  2 at*)  +  (1  -  p)  log(l  +  2 aa2t 


...  .1 


where 

■a  =  s2/(l  +  s2) 

a^jp  +  q-  l)  +  (a-p)  +  y^(p  +  a  -  1)  +  a  -  p)a  +  a(l  -  a) 
*  —  ""  4a2a(l  —  a) 


We  use  the  expression  in  (16.25)  in  our  calculations. 


16.3  Bahadur  Slopes  and  Efficiencies 

We  now  derive  the  Bahadur  slopes  of  the  common  test  statistics  for  testing 
H0  ;  Q  =  0  vs.  H\  :  9  >  0  in  the  Tukey  model,  using  the  results  of  Section 
16.2.  These  slopes  will  depend  on  the  alternative  hypothesis,  i.e.,  on  the  vector 
A  =  ( 9 ,  e,  a)  with  9  >  0. 

1.  The  Bahadur  slope  of  the  mean  test  is 

Cm  (A)  =  2[9tx-cl(tx)}  (16.26) 

where  c\{t)  is  defined  in  (16.12)  and  tx  satisfies  the  equation 

et*/2(\-e){0-tx)  +  eea2t*/2(9-txCT2)  =  0.  (16.27) 

2.  The  Bahadur  slope  of  the  sign  test  is 

cs(\)  =  2[\og2  +  px\og{px)  +  Qxlog(qx)\  (l6-28) 

where  px  =  (1  -  e)$(0)  +  £  $(9/o),  qx  =  1  ~  Px  and  $  is  the  cdf  of  the 
standard  normal  distribution. 

3.  The  Bahadur  slope  of  the  Wilcoxon  test  is 

cJA)  =  2  (26(A)  tx  -  log(cosh(fA))] 


(16.29) 
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where 


6(A)  =  (^(^)+2((1-^(7=p) 


+  (1  -  £)2*(n/20)  -  \  (16,30) 

and  tx  is  the  solution  of  the  equation  (16.21)  with  s  =  b{\). 

4  The  Bahadur  slope  of  the  t-statistic  is 

Cl(A)=2/4(6(A))  (16-31> 

where  6(A)  =  6/s/(  1  -  0  +  «r7  and  ,<(s)  is  given  by  (16'24)' 

The  Bahadur  efficiencies  of  the  mean  test,  t-test,  and  the  sign  test  with 

,  r„  Bi.  Wficoxon  test  are  defined  as  the  ratio  of  the  slopes  and  they 
respect  to  the  Wilcoxon  test  ,  =  c,(a)/c«,(A)  and  e*(s,  to)  = 

ar?Af/Ven(Aj'  respectively.  The  Pitman  efficiencies  of  these  test  statistics  can  be 

.  f. 

rsri”  s  » *.  *—  -  -  - 

the  common  value  is  given  by 


ep\(m,  tv)  —  epA (t)  tv) 


r  o  2\/2e(l  —  €)  e2 

=  +  (1  _ +  ~t/T+W  +  » 


(16.32) 


whereas  the  Pitman  efficiency  of  the  sign  test  with  respect  to  the  Wilcoxon  test 


2  ,  .  ,2  1„  _  ,2  ,  2'/2e(l  - t)  +  .  (16.33) 

ePx(s,  to)  =  ;  1(1 "  e) +  ^  x/lVo5  , 


Following  the  convention  set  in  Andrews  et  ol  (19T2). ■  ^  ^hese; 

and  computed  the  Bahadur  efficiencies  e*(m,  to),  JaMO  ^ 

dCT°“ the”  5imP‘y 

“  tguml'  Si  Sive  the  surface 

1:  mvt  the  1"  Information  by*  looking  at  theperformancesof 


*  *  * 
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a  numerical  point  of  view,  the  corresponding  Pitman  efficiencies  are  given  by 
the  restriction  of  these  surfaces  to  the  plane  0  =  0.  The  fact  that  the  limiting 
Bahadur  efficiency  as  9  -+  0  yields  the  Pitman  efficiency  has  been  established 
in  great  generality  in  Wieand  (1976),  and  we  conjecture  that  it  is  true  in  this 
case  also.  It  is  clear  that  the  Bahadur  efficiencies  of  the  mean  test,  i-test  and 
the  sign  test  with  respect  to  the  Wilcoxon  test  is  less  than  1  in  a  neighborhood 
of  0  =  0  and  e  =  0,  but  not  on  the  whole  region  of  alternatives.  This  leads 
us  to  the  conclusion  that  the  Wilcoxon  test  outperforms  the  remaining  tests 
in  a  neighborhood  of  the  null  hypothesis  even  under  the  presence  of  moderate 
contamination. 
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Table  16.1:  Bahadur  efficiencies  of  the  mean  test,  t- test  and  the  sign  test  with 
respect  to  the  Wilcoxon  test  when  the  level  of  contamination  is  5% 


e 

6 

e(m,  w) 

e(t,  w) 

e(s,  w ) 

mm 

0.83615 

0.83615 

0.69638 

0.84208 

0.86082 

0.70317 

0.86294 

0.89467 

0.72332 

0.97967 

0.94991 

0.79873 

iljtjjj 

1.23220 

1.07476 

0.90046 

E 

1.61877 

1.26326 

0.98765 

Hal 

2.08962 

1.46462 

1.02746 

3.000 

2.59458 

1.64439 

1.02776 

Table  16.2:  Bahadur  efficiencies  of  the  mean  test,  t-test  and  the  sign  test  with 
respect  to  the  Wilcoxon  test  when  the  level  of  contamination  is  10% 


e 

B 

e(m,  w) 

1 02^^31 

e(s,  w) 

0.000 

0.72819 

072819 

0.72689 

0.250 

0.73488 

0.75380 

0.73400 

0.500 

0.75811 

0.81043 

0.75505 

1.000 

0.87895 

0.91614 

0.83311 

1.500 

1.12328 

1.05802 

0.93433 

2.000 

1.47700 

1.24380 

1.01025 

2.500 

1.67873 

1.33772 

1.02781 

3.000 

2.32384 

1.58321 

1.02866 

Table  16.3:  Bahadur  efficiencies  of  the  mean  test,  t-test  and  the  sign  test  with 
respect  to  the  Wilcoxon  test  when  the  level  of  contamination  is  25% 
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e(m,  w) 

e(s,  w) 

0.25 
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0.61885 

0.61885 

0.82077 

0.62890 

0.63528 

0.82785 

0.65991 

0.68437 

0.84847 

0.79132 

0.86165 

0.91906 

n 

1.01506 

1.06753 

0.99199 

hbr 

1.30210 

1.24270 

1.02229 

mm 

1.61679 

1.38110 

1.00918 

3.000 

1.94835 

1.49314 

0.98289 

for  Common  StatisticaJ  Tests 


Figure  16.1:  Bahadur  efficiency  of  the  mean  test  with  respect  to  the 
6  Wilcoxon  test 
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Bahadur  efficiency  of  mean  teat  va  Wllcoxon  teat 


Figure  16.3:  Bahadur  efficiency  of  the  sign  test  with  respect  to  the  Wilcoxon 

test 
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Abstract 

The  generalized  estimating  equations  (GEE)  introduced  by  Liang  and  Zeger  ( Biometrika 
73  (1986)  13—22)  have  been  widely  used  over  the  past  decade  to  analyze  longitudinal  data. 
The  method  uses  a  generalized  quasi-score  function  estimate  for  the  regression  coefficients,  and 
moment  estimates  for  the  correlation  parameters.  Recently,  Crowder  ( Biometrika  82  (1995) 
407-410)  has  pointed  out  some  pitfalls  with  the  estimation  of  the  correlation  parameters  in  the 
GEE  method.  In  this  paper  we  present  a  new  method  for  estimating  the  correlation  parameters 
which  overcomes  those  pitfalls.  For  some  commonly  assumed  correlation  structures,  we  obtain 
unique  feasible  estimates  for  the  correlation  parameters.  Large  sample  properties  of  our  estimates 
are  also  established,  (c)  1997  Elsevier  Science  B.V. 

AMS  classification:  62J12;  62F10;  62F12 

Keywords :  GEE;  Longitudinal  data;  Positive  definite;  Quasi-likelihood;  Repeated  measures; 
Generalized  least  squares 


1.  Introduction 

The  statistical  analysis  of  longitudinal  data  has  been  the  topic  of  numerous  statistical 
papers  in  recent  years.  Several  books  on  the  topic  have  also  been  published,  for  example 
Diggle  et  al.  (1994),  Jones  (1993)  and  Lindsey  (1993).  Such  data  naturally  occur  when 
repeated  observations  are  taken  on  individuals,  or  the  data  is  taken  on  clusters  or  groups 
of  subjects  sharing  similar  characteristics. 

In  a  landmark  paper,  Liang  and  Zeger  (1986)  introduced  the  generalized  estimating 
equations  (GEE)  for  analyzing  longitudinal  data.  The  setup  and  the  method  can  be 
briefly  described  as  follows.  Let  Yi  —  (yn,...,yitjy  be  a  vector  of  repeated  measure¬ 
ments  taken  on  the  z'th  subject;  associated  with  each  measurement  y,y  is  a  vector  of 
covariates  =(*i/i,-- •,•%>)',  1  </<*),  1  ^i<m.  We  will  assume  that  the  Tj-’s  are  un¬ 
correlated.  We  do  not  specify  the  joint  distribution  of  the  vector  fj-,  but  do  make  some 

*Tel.  +1  804683  3897;  fax:  +1  804683  3885;  e-mail:  nrc@math.odu.edu. 
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assumptions  concerning  the  moments.  Let  E(yij)  =  Hy',  var (>>//)  =  where  <f)> 0 

may  be  a  known  constant  or  an  unknown  scale  parameter.  The  variance  function  h  is 
assumed  to  be  a  known  function.  We  also  assume  that  there  is  an  invertible  function 
g,  known  as  the  ‘link  function’,  such  that  iiij  =  g~x{x'ij^ ),  where  p  =  (Pi,.--,PpY  is  a 
vector  of  regression  coefficients.  Our  main  parameter  of  interest  is  /?.  For  simplicity, 
we  will  consider  the  balanced  case  in  this  paper;  henceforth,  we  will  set  f,  =  t  for  all 
1  Extensions  of  our  results  to  the  unbalanced  case  and  missing  data  situations 

will  appear  elsewhere. 

The  idea  of  Liang  and  Zeger  (1986)  is  to  model  the  dependence  among  the  repeated 
measurements  on  the  zth  subject,  in  the  form  of  a  ‘working  correlation  matrix  R(<x), 
which  is  assumed  to  be  a  function  of  a  vector  of  parameters  a  =  (ai,...,a?)'.  The 
covariance  matrix  of  Y,-  is  then  given  by  </>£,-,  where  Zj=Al/2(P)R(ct)A]/2(P)  and 
Ai(P)  =  diag(/z(ju,i), h{ga), . . .,h(nit)).  The  parameter  a  is  considered  to  be  a  nuisance 
parameter.  Let  SA  be  the  subset  of  IR9  such  that  R{a)  is  a  positive-definite  matrix  for 
txey.  In  most  examples  the  set  SA  is  an  open  convex  subset  of  R9  and  R( a)  converges 
to  a  positive  semi-definite  matrix  as  a  approaches  the  boundary  of  the  set  SA .  Liang 
and  Zeger  (1986)  suggest  estimating  fl  using  a  GEE  and  estimating  a  and  <f>  using 
moment  estimates  via  the  current  Pearsonian  residuals.  The  estimate  of  ft  based  on  the 
GEE  is  essentially  a  multivariate  analog  of  the  quasi-score  function  estimate  based  on 
quasi-likelihood  method.  See  Wedderbum  (1974)  and  McCullagh  (1983). 

The  GEE  approach  has  several  inherent  pitfalls.  Liang  and  Zeger  (1986)  have  es¬ 
tablished  consistency  and  asymptotic  normality  of  their  estimate  of  0  as  m->oo.  Their 
proof  depended  on  the  use  of  m1/2  consistent  estimates  for  both  a  and  <j>.  Using  simple 
calculations,  Crowder  (1995)  has  demonstrated  that  there  can  be  no  general  asymptotic 
theory  supporting  existence  or  consistency  of  the  joint  distribution  of  the  estimates  of  ft 
and  a.  He  also  used  examples  to  show  that  the  moment  estimate  of  a  might  not  fall  in 
the  set  SA  of  feasible  values  if  the  correlation  structure  is  misspecified,  thus  crippling  the 
whole  estimation  procedure.  Prentice  (1988)  suggested  another  GEE  for  the  estimate 
of  a  and  established  asymptotic  normality  for  the  joint  distribution  of  his  estimates  of 
/?  and  a.  Prentice  and  Zhao  (1991),  extending  the  idea  of  Prentice  (1988),  introduced 
estimating  equations  in  an  ad  hoc  fashion  for  the  covariance  parameter  estimation  for  a 
general  multivariate  response.  There  is  no  guarantee,  however,  the  suggested  estimates 
of  the  correlation  parameters  in  either  of  those  papers  will  fall  within  the  set  SA  of 
feasible  values  for  small  and  even  for  moderately  large  samples. 

In  this  paper  we  give  a  new  approach  to  estimating  the  nuisance  parameter  cc.  Our 
method  can  be  regarded  as  an  extension  of  the  method  of  (generalized)  least  squares, 
where  we  assume  that  the  elements  of  the  covariance  matrix  are  functions  of  the 
regression  parameters,  moreover  the  off-diagonal  elements  are  also  functions  of  some 
unknown  nuisance  parameters.  A  partial  minimization  is  then  performed  both  with 
respect  to  the  regression  parameters  as  well  as  the  unknown  nuisance  parameters. 

In  addition  to  yielding  feasible  estimates  of  a,  our  approach  has  several  other  ad¬ 
vantages.  The  method  of  Prentice  (1988)  is  computationally  intensive,  whereas  in  this 
paper  we  have  closed-form  expressions  for  our  estimates  of  cc  for  some  correlation 


N.  Rao  Chaganty  I  Journal  of  Statistical  Planning  and  Inference  63  (1997)  39-54 


41 


structures.  The  method  of  Liang  and  Zeger  (1986),  for  some  correlation  models,  re¬ 
quires  estimation  of  (f>  a  priori  to  the  estimation  of  P  and  a.  Our  estimates  of  p  and  a 
are  independent  of  the  value  of  <f>.  Unlike  in  Liang  and  Zeger  (1986,  Theorem  2), 
we  will  establish  consistency  and  asymptotic  normality  of  our  estimate  of  p,  without 
making  any  assumptions  about  the  asymptotic  properties  of  the  estimates  of  a  and  (j>. 

The  organization  of  this  paper  is  as  follows.  In  Section  2  we  will  give  a  motiva¬ 
tion  and  derive  an  alternative  set  of  estimating  equations  for  P  and  a.  The  estimating 
equation  for  P  is  same  as  the  GEE,  whereas  the  estimating  equation  for  a  is  new. 
The  method  of  solving  the  estimating  equations  will  be  discussed  in  Section  3.  In  Sec¬ 
tion  4  the  existence  of  a  unique  feasible  estimate,  a,  for  a  will  be  established  and  a 
closed-form  expression  for  a  for  most  of  the  commonly  assumed  correlation  structures 
will  be  derived.  In  Section^  we  derive  the  large  sample  properties  of  our  estimates, 
in  particular^  showing  that  p  is  consistent  and  a  is  asymptotically  biased.  We  will  also 
prove  that  p  and  a  are  jointly  asymptotically  normal  and  obtain  expressions  for  the 
asymptotic  covariances,  and  furthermore  will  show  that  the  asymptotic  distribution  of 
a  does  not  depend  on  p.  In  Section  6  we  present  some  simulation  results,  which  show 
that  for  small  samples,  our  estimate  of  p  is  highly  efficient  compared  with  the  GEE 
estimate.  Finally,  the  proofs  are  given  in  the  appendix. 


2.  Estimating  equations 

This  section  outlines  our  new  method  of  estimation  of  the  unknown  parameters  p,  a, 
and  <f>.  For  the  longitudinal  data  setup  described  in  Section  1,  it  is  clear  since  we 
have  no  knowledge  of  the  underlying  distribution,  that  we  should  think  of  estimating 
the  unknown  parameters  by  the  principle  of  (generalized)  least  squares.  This  requires 
minimizing  the  quadratic  form 


Q*it.  “)  =  4  E  0/  -  -  MB) 

9  1=1 

=  £  tiXi  -  -  pm.  (2.i) 

Equating  to  zero  the  partial  derivative  with  respect  to  a  of  (2.1)  gives  the  first  set 
of  estimating  equations: 


8R~\ (a) 

dtXj 


Zi  =  0, 


(2.2) 


where  Zi=A~l/2(P)(Yi  -  /*,-(/?)),  Strictly  speaking  the  estimating  equation 

for  P  should  now  be  obtained  by  differentiating  (2.1)  with  respect  to  p.  However,  we 
would  like  to  avoid  certain  complications  that  arise  with  the  differentiation,  caused 
by  P  appearing  both  in  the  mean  vector  fifP)  and  the  variance  matrix  AfP).  And 
importantly  we  would  like  to  get  an  estimating  equation  which  yields  an  unbiased 
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estimate  for  /?  in  some  cases  where  a  is  known.  Finally,  we  would  like  our  estimate  of 
P  to  coincide  with  the  maximum  likelihood  estimate  in  cases  where  the  observations 
yij  s  are  a  random  sample  from  an  exponential  family  of  distributions.  To  derive  an 
estimating  equation  for  ft  which  satisfies  the  three  aforementioned  requirements,  we 
will  treat  the  quadratic  form  (2.1)  as  a  function  of  three  variables 

q(p,p\  «) = e  (y,  -  -  nm-  (2.3) 

/= I 

Carrying  out  the  differentiation  of  (2.3)  with  respect  to  ft,  then  substituting  p*  —  ft, 
gives  the  estimating  equation 

Eiyi(P)Arl/2(P)R-\«)Z,  =  0,  (2.4) 

i=i 

where  Dt(p)  =  dpi/dp1. 

Solving  Eq.  (2.4)  for  p  amounts  to  searching  the  minimum  values  of  (2.3)  along 
cross  sections  perpendicular  to  the  P*  axis,  and  choosing  the  value  that  falls  on  the  45° 
line  with  respect  to  the  (/?,/?*)  axis.  But  the  principle  of  (generalized)  least  squares 
requires  finding  the  infimum  of  (2.3)  along  the  45°  line  with  respect  to  the  (/?,/?*)  axis. 
In  general  these  two  methods  of  minimization  do  not  yield  the  same  estimate  for  P, 
though  they  do  coincide  if  the  global  infimum  of  (2.3)  with  respect  to  (/?,  p*)  happens 
to  fall  on  the  45°  line.  Eq.  (2.4)  is  exactly  the  equation  proposed  by  Liang  and  Zeger 
(1986)  to  estimate  P,  and  can  also  be  derived  using  the  principle  of  quasi-likelihood. 

Our  method  of  estimating  the  parameters  is  to  solve  the  estimating  Eqs.  (2.2)  and 
(2.4)  simultaneously  for  P  and  a  to  obtain  estimates  P  and  a.  A  step  by  step  recursive 
algorithm  for  solving  the  equations,  based  on  a  Fisher  scoring  method  similar  to  the 
one  proposed  by  Liang  and  Zeger  (1986),  is  given  in  Section  3.  From  the  definition 
of  P  and  a  we  have 

for  all  P,a  (2.5) 

where  the  function  Q  is  defined  in  (2.3).  Since  the  estimates  do  not  fully  conform  to 
the  principle  of  (generalized)  least  squares,  it  is  reasonable  to  call  our  estimates  P,  and 
a,  ‘quasi-least  squares  estimates’  of  p  and  a,  respectively. 

Suppose  that  0  is  an  unknown  scale  parameter;  it  is  playing  the  same  role  as  a2  of 
ordinary  least  squares  (OLS)  theory.  See  Rao  (1973,  p.  227).  In  OLS,  a2  is  estimated 
using  the  mean  residual  sum  of  squares,  and  the  same  approach  here  says  to  estimate 
(f)  by 


-  1  m  ^ 

<*.  =  -£  Z/Z,  (2.6) 

mt  y-1 

where  Zi=A~1^2(p)(Yi  —  Pi(P)).  If  a  bias-corrected  estimate  is  preferable,  we  can  use 
4>b  =mt(j)/(mt  -  p),  instead. 
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3.  Iterative  procedure  for  the  estimates 


We  will  now  study  methods  of  solving  Eqs.  (2.2)  and  (2.4)  for  /?  and  a.  Let 


m 

1=1 


(3.1) 


where  Zj=At  I^2(>S)(J/  —  #(/?)),  for  l^i^m.  Eq.  (2.2)  can  be  rewritten  as 


d/(cQ  A  Ha) 
daj  h  1  daj 


i=i 


dR(a) 

daj 


R~\a)Zi  =  0, 


1 

(3.2) 


For  many  commonly  employed  correlation  structures  where  i?-1(a)  is  readily  available, 
Eq.  (3.2)  can  be  solved  explicitly  for  a  in  terms  of  Zfs.  In  other  cases  an  alternate 
way  of  solving  Eq.  (2.2)  is  to  use  the  spectral  decomposition 


R(u)=P(a)A(a)P'(  a), 


where  P( a)  is  an  orthogonal  matrix  of  eigenvectors  and  A( a)  =  diag(4(a))  is  a  diago¬ 
nal  matrix  consisting  of  the  eigenvalues  of  i?(a).  We  will  see  later  for  some  commonly 
employed  correlation  structures  R(<x),  the  matrix  of  eigenvectors  P(ot)=P  does  not  de¬ 
pend  on  a,  but  only  the  eigenvalues  /1(a)  depend  on  a.  In  this  case  we  can  rewrite 
Eq.  (3.1)  as 


/(«)  =  E ZlPA~l(cc)P'Zi  =  £  W{A~l(a)Wi 

i=i  i=i 

=  V'  Y' 

kbM*) 


=  E  (jbrfklhi a)), 


(3.3) 


where  Wj  =P/Zi  =  (wiic).  Differentiating  (3.3)  with  respect  to  a  we  get  the  following 
set  of  estimating  equations: 


g/(«) 

da.j 


4;{s  ( S^A(a))}=0, 


(3.4) 


which  can  be  used  instead  of  (3.2)  to  get  an  estimate  of  a. 

An  iterative  method  for  obtaining  the  estimates  ft,  oi  of  J?  and  a  respectively,  can  be 
described  as  follows: 

Step  1:  Choose  an  initialjvalue  /?  for  /?. 

Step  2:  Compute  At  =Af(i),  pi  =  /r, •(/?),  Z,  =Arl/2  {Y—pi)  and  Dt  =D/(j?), 

Step  3:  Solve  for  a  using  either  Eq.  (3.2)  or  Eq.  (3.4).  Compute  R  =  R(a)  and 
Zi=Arl/2RA~l/2, 
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Step  4:  Update  the  value  of  /?  as 

W+{eWa}  | £ D'i zr'w - £<•)} • 

Step  5:  Stop  the  process  if  ~  and  take  /?  as  an  estimate  of  /?.  The  estimate 
of  a  is  given  by  a  =  a.  Otherwise,  repeat  Steps  2-4  replacing  /?  by  /J. 

In  the  next  section  we  will  show  that  the  estimate  a  in  Step  3  falls  within  the 
set  of  feasible  values  at  every  step  of  the  iteration  for  commonly  assumed  correlation 
structures. 


4.  Special  correlation  structures 

We  will  now  discuss  solutions  for  Eq.  (3.2)  for  commonly  assumed  correlation 
structures.  In  some  of  the  examples,  we  will  also  obtain  unique  feasible,  closed  form 
solutions  for  the  estimate  of  a.  In  general,  that  there  is  a  unique  solution  aeSf  for  the 
equations  given  by  (3.2)  can  be  shown  as  follows.  Let  us  denote  the  first  and  second 
order  partial  derivatives  of  R(a )  by  the  matrices 

= ...  fdm\  s^m  =  (fm) 

da  \  da j  J  ’  da2  \dajdaf ) 

respectively,  both  of  order  qt  x  qt.  We  will  use  the  symbol  0  to  denote  the  Kronecker 
product  between  two  matrices.  It  is  easy  to  verify  that  the  matrix  of  second-order 
partial  derivatives  of  /(a)  defined  in  (3.1)  can  be  written  as 

^(«)  =  2E{(4®^"1)^(ee,0i?-1)^(/90i?-1ZO 
-(/90Z'i?-1)^^(/?0i?-1Z,)}, 

where  R~x  =R~l( a)  and  Iq  is  the  identity  matrix  of  order  q  and  e  is  a  t  x  1  column 
vector  of  ones.  For  several  correlation  structures,  the  elements  of  the  correlation  matrix 
R{ a)  are  linear  functions  of  a  and  we  have  d2R(a)/da2  =  0.  It  is  therefore  easy  to  verify 
that  V2f(a)  is  a  positive  definite  matrix  for  aeS?,  for  all  m,  or  in  some  cases  for  m^t. 
Hence  /(a)  is  a  strictly  convex  function.  Furthermore,  /(a)  — ►  oo  as  a  approaches 
the  boundary  of  Sf.  It  thus  has  a  unique  minimum  at  a  €  Sf,  where  a  is  such  that 
Vf(a)  =  0.  Note  that  in  Examples  4.1-4.3  below  q—  1  and  oq  =p. 

Example  4.1.  Suppose  that  the  observations  on  each  subject  are  equicorrelated  with 
correlation  p.  The  correlation  matrix  equals  R(p)  =  (1  —  p)I,  4-  pee',  where  p  ZzSf  — 
(— 1  /(t  —  1),  1).  For  this  correlation  structure,  it  is  well  known  that 

R  '{p)=  { 0  -  p/' "  (i  -  />)(i  +  «  -  Drt  ' 
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Thus  in  this  case  (3.2)  reduces  to 


1+(*~1)P2 

(l+(,_l)p)2 


=  0. 


(4.1) 


Let  Zi  —  {e'Zi)lt  and  Sf  —  {Z/ (/,  —  ee'/t)Zi}/{t  —  1)  be  the  mean  and  variance  of  the 
components  of  the  vector  Z,.  If  we  let  d  =  t  (£*,  Z?)/(  Sf)  then  Eq.  (4.1)  can 
be  written  as 


{'  +  W^)}=d-  <«> 

The  value  of  peSf  satisfying  (4.2)  is  given  by 

dW-\ 

P  dW  +  (t-\y  (4-3) 

We  can  use  the  above  value  of  p  in  Step  3  of  the  iterative  process  for  this  correlation 
structure.  In  this  example  the  method  of  Liang  and  Zeger  (1986)  requires  the  estima¬ 
tion  of  (f>  a  priori  to  the  estimation  of  p,  whereas  our  estimate  p  given  by  (4.3)  is 
independent  of  <f>. 


Example  4.2.  Let  the  correlation  matrix  R(p)  be  a  tridiagonal  matrix,  with  1  on  the  di¬ 
agonal  and  p  on  the  upper  and  lower  diagonals.  This  is  equivalent  to  the  one-dependent 
model.  The  eigenvalues  and  eigenvectors  of  R(p)  are  given  by 

4 (p)  =  1  +  2p  cos{£7t/(/  +  1 )},  1  ^  k  <  t 

and 


xic  =  (sin{kn/(t  +  1 )},...,  sin{r£7t/(/-|-  1)})',  l^Jt<t, 

respectively.  We  can  verify  that  R(p)  is  positive  definite  if  and  only  if  pe£f  =  (pupt), 
where  Pk=  —  1/(2  cos{&jt/(r  +1)}).  Clearly,  in  this  example  the  eigenvectors  do  not 
depend  on  p.  Since  x*’s  are  not  orthonormal  we  can  construct,  using  Gram-Schmidt 
orthogonalization,  a  set  of  orthonormal  eigenvectors  {/>*,  from  x*’s.  Let 

Wi=P>Zi  =  (wfc),  where  the  £th  column  of  P  is  p^.  In  this  case  we  can  verify  that 
Eq.  (3.4)  reduces  to 

f(p)  =  —  fy' _ _ 

dP  4 (p)  J  dp  (l+2p  cos{^7t/(/+ 1)}) 


('y 2  cos{ft7t/(r  +  1)}  >4 1 

0  +2P  cos{£ji/(t  +  l)})2  J 


It  is  easy  to  check  that  /'(pi)—  —  oo,  f'(Pt)  =  oo  and  f'(p)  is  continuous  on 
the  interval  ( pup, ).  Therefore,  there  exists  a  p  such  that  /'(p)  =  0.  This  establishes 
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the  existence  of  a  solution  for  the  above  equation.  The  value  of  p  can  be  computed 
numerically  and  can  be  used  in  Step  3  of  the  iterative  process.  We  can  also  handle 
/-dependent  (/>  1)  structure  in  a  similar  manner,  though  the  expressions  for  the  eigen¬ 
values  and  eigenvectors  are  not  as  simple  as  the  case  /  =  1 .  Unlike  the  method  of 
Liang  and  Zeger  (1986)  no  estimate  of  <f>  is  required  to  get  p  in  this  example. 

Example  4.3.  Suppose  the  correlation  matrix  i?(p)  =  (pl'--'l),  where  p  e  Sf  = 

This  structure  is  the  well  known  first-order  autoregressive  (AR(1))  structure.  Note  that 

=(7T ^{Wa-Pc,}, 

where  Ci  =  diag(0, 1,...,  1,0)  and  C\  is  a  tridiagonal  matrix  with  0  on  the  diagonal 
and  1  on  the  upper  and  lower  diagonals.  Thus 

f(p)  =  'tz,iR-\p)Zi 
1=1 

=  77— -jt  {  t  Z’A  +  (?£  z;c2z,  -  P  t  z’tCtzX . 

(1  -p*)  l  i=l  /= l  i=l  J 

Equating  to  zero  the  derivative  of  /(p)  we  get 

amp2  -2bmp  +  am  =  0,  (4.4) 

where  am  =  ££Lj  Z/QZ,  and  bm  =  YJ=i  Z/(/f +  C2)Z*.  Note  that  the  elements  of  R(p) 
are  not  linear  functions  of  p  in  this  example.  We  will  show  in  Appendix  A  that  there 
is  a  unique  root  for  Eq.  (4.4)  in  the  interval  ST  =  (— 1,1)  and  is  given  by 

p  =  b,-{bl-al}'^  (4 ,5) 

Qm 

The  value  of  p  can  be  used  in  Step  3  of  the  iterative  process.  In  this  example,  if  we 
use  the  method  of  Liang  and  Zeger  (1986),  an  estimate  of  4>  must  be  computed  in  the 
determination  of  the  estimate  of  /?,  whereas  our  method  does  not  require  estimation  of 
(j)  prior  to  the  estimation  of  /?. 

Example  4.4.  We  now  consider  the  case  where  the  correlation  matrix  R  is  totally 
unspecified.  To  get  an  estimate  of  R,  we  need  to 

min  £  Z//?_1Z,-  =  min  tr(Z/?_1 ),  (4.6) 

R  i= l  * 

where  Z  =  ££LjZ/Z/.  Let  us  assume  that  m^t;  which  is  a  reasonable  assumption,  con¬ 
sidering  the  fact  that  we  have  t(t— 1)/2  unknown  correlation  parameters.  The  matrix  Z 
is  positive  definite  in  this  case.  It  has  been  shown  by  Whittle  (1958,  p.  234,  Lemma  3) 
that  there  exists  a  unique,  positive-definite  correlation  matrix  R  where  the  minimum 
(4.6)  is  attained.  See  also  Olkin  and  Pratt  (1958).  The  correlation  matrix/?,  can  be 
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obtained  by  solving  the  equation 

Z  =  RAR,  (4.7) 

where  A  is  a  diagonal  matrix  of  positive  elements.  It  follows  from  the  results  of  Olkin 
and  Pratt  (1958,  p.  231)  that  the  solution  to  Eq.  (4.7)  is  given  by 

^=J-l/2(jl/2ZjV2)l/2J-l/2  (4  g) 

and  the  diagonal  matrix  A  satisfies  the  fixed  point  equation 

A  =  diag(d1/2Zd1/2)1/2.  (4.9) 

For  a  given  Z,  the  diagonal  matrix  A  satisfying  (4.9)  can  be  obtained  recursively 
starting  with  a  trial  value  A0  and  computing  Ak  =  diag(zlJ/i1Zd{/i1)1/2  at  the  kth  step. 
The  proof  that  this  fixed  point  iteration  scheme  converges  to  the  unique  solution  of 
Eq.  (4.9)  and  related  results  will  appear  elsewhere.  The  estimate  R  given  by  Eq.  (4.8) 
can  be  used  in  Step  3  of  the  iterative  process. 

5.  Large  sample  properties 


In  this  section,  we  will  study  the  large  sample  properties  of  the  quasi-least  square 
estimates  /J  and  a  defined  in  Section  2.  In  particular,  we  will  show  that  is  consis¬ 
tent,  whereas  am  is  asymptotically  biased  as  m— >oo;  the  subscript  m  emphasizes  the 
dependence  of  the  estimates  on  m.  Theorem  5.1  below  shows  that  the  joint  distribution 
of  (An-^m)  is  asymptotically  normal.  We  will  introduce  some  notation  before  stating 
the  main  theorem  of  this  section.  LetR  be  the  true  correlation  matrix.  Recall  that  R(a) 
is  the  working  correlation  matrix. 

Assume  that  $ =£(Z,-  ®Z,Z/)  and  !F=E,(ZIZ/®Z,Z/)  are  finite,  where  the  expec¬ 
tation  is  taken  under  the  true  correlation  structure  R .  Let  A  =  (/?,a,  <$>)'  and  $  =  (/?,  a)'. 
Define 

Jam = (5.1) 

riU(6)  =  {D'0)A~  1/2(0  )R- 1  (a)Rir 1  (a>f  “  l/2(P)Di(p)}pxp,  (5.2) 

^u(A)  =  M{(ejDmA~1/2m-\oc))  ®  Bk}*]}pxq,  (5.3) 

where  Bic  =  dR~l(cc)/daic  and  ej  is  p  x  1  column  vector  with  one  at  the  yth  row  and 
zero  elsewhere.  Note  that  if  the  working  correlation  is  indeed  the  true  correlation  then 
^iu{9)  =  Jnx(Q).  The  following  three  quantities  are  useful  to  describe  the  asymptotic 
distribution  of  am: 

a(a)  =  [tr{5yR}]?xi, 


•/22(a)  = 


^41 

dctj  dak  J 


> 

qxq 


/22(A)  =  [tr{(5y  ®5*)!F}/02  —  tr  {BjR  }tr  {BkR  }],*,. 


(5.4) 
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Define 


St(l)  = 


Assume  that 


2^„  (0)  0 

0  4>^22(z) 


m)  = 


l(0)  ^12(2) 

*n2(2)  4>V22(A) 


2/ii(0)  0 

0  ^22(a) 


=  Jf(A)  (say) 


1  m 

:EW 

m  1=1 


4 <t>ru{8)  rn{i) 

K(X)  <fr2l{X) 


=  r(X)  (say) 


as  m— >oo. 

We  are  now  in  a  position  to  state  the  main  theorem  of  this  section.  The  regularity 
conditions  needed  to  establish  Theorem  5.1  are  same  as  the  conditions  that  we  would 
normally  use  in  a  multivariate  central  limit  theorem  for  independent  but  not  necessarily 
identically  distributed  random  vectors.  In  fact,  conditions  (5.6),  (5.7)  are  similar  to  the 
condition  on  the  covariance  matrices  that  is  in  the  multivariate  central  limit  theorem, 
Theorem  B  of  Serfling  (1981,  p.  30). 

Theorem  5.1.  Let  X  =  (f},u,<f>y  be  fixed.  Let  0  —  (/?,  a.)'  and  6m  =  (Pm,  am)'  be  the 
solution  to  Eqs.  (2.2)  and  (2.4).  Suppose  that  conditions  (5.6)  and  (5.7)  hold.  Then 
fim  is  a  consistent  estimate  of  /?,  whereas  am  is  asymptotically  biased.  Further, 


(0«  -  6)  is  AN  Jf~\X )p(X), 


j-\x)r(X)j~\x) 


where  p(X)  =  (0,(pa'(a)y  and  ./(A),  '^‘(A),  a{d)  are  defined  in  (5.6),  (5.7)  and  (5.4). 

Proof  of  Theorem  5.1  is  given  in  Appendix  B.  It  is  easy  to  check  that  (5.8)  implies 
that  flm  and  am  are  asymptotically  correlated  and 


^  is  AN  p. 


MlWliW-VW 


o?m  is  AN  (  a  + 


We  can  also  easily  verify  that  <j>m  given  in  (2.6)  is  a  consistent  estimate  of  <j> 
using  the  fact  that  is  a  consistent  estimate  of  /?,  even  if  the  working  correlation  is 
misspecified.  Note  that  if  the  working  correlation  is  correctly  specified,  that  is,  R(a)  =R, 
then  —  and  the  asymptotic  covariance  of  reduces  to  4>Jf\d)/m. 

Since  in  practice  the  true  correlation  R  is  unknown,  we  can  assume  that  the  working 
correlation  is  correctly  specified.  An  estimate  of  the  covariance  matrix  of  f3m  is  then 
obtained  by  replacing  the  parameters  a  and  P  in  (5.1)  with  their  estimates,  giving 
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US 

_ /N  I  ^  ^ 

coviO?m)  =  </>m<  Dj(fim)Zi  Dj(fim) 

l  *=i 

where  £/ =Al/2(f}m)R(am)Al/2(pm)  and  is  given  by  (2.6).  Alternatively,  following 
Liang  and  Zeger  (1986),  Royall  (1986),  we  can  estimate  the  covariance  matrix  of 
using  a  model-robust,  sandwich-type  variance  estimator  given  by 

cov2(t)  =  |  ED'(?m)ir'A(^)}  |  £ D'(% )Z~ 1  cov( y,  )*r 1  A(?« )} 

X  ^  1A'(/?m)j'  (5.10) 

where  cov(y;)  =  UtUl,  Ut  =  Yt-  The  estimate  (5.9)  or  (5.10)  can  be  used  to 

construct  confidence  intervals  for  linear  functions  of  /?. 

Remark  5.1.  It  is  interesting  to  note  that  the  asymptotic  distribution  of  am  does  not 
depend  on  /?,  unlike  the  asymptotic  distribution  of  which  depends  on  all  the  pa¬ 
rameters  /?,  a,  <f>  and/?.  Also,  the  asymptotic  bias  of  am  depends  only  on  a  and R  but 
not  on  0. 

Remark  5.2.  In  the  case  where  the  distribution  of  Z,-’s  is  correctly  specified  and  it  is 
a  multivariate  normal  distribution  with  mean  0  and  covariance  matrix  0/?( a)  and  if 
d2R(a)/da2=  0,  then  we  can  show  that  W  =  ./22(a).  Thus  in  this  case  we  have 

a„  w  AN  ^  +  /-1(a)a(a),^i^j.  (5.11) 

Remark  5.3.  For  some  working  correlation  structures,  simulation  results  have  shown 
that  it  is  possible  to  reduce  the  bias  ofam  using  the  jackknife,  bias-reducing  technique 
Also,  since  the  asymptotic  covariance  of  am  depends  on  the  third  and  fourth  moments 
of  the  yij’ s,  it  is  perhaps  best  to  use  the  nonparametric  method,  bootstrap,  to  estimate 
the  covariance  of  am.  See  Efron  (1982)  for  an  excellent  introduction  to  the  jackknife 
and  the  bootstrap  methods.  On  the  other  hand,  in  data  analysis  problems  where  a  is 
also  an  important  parameter,  the  GEE  method  is  preferable,  since  it  uses  a  consistent 
estimate  of  a,  provided  of  course  the  GEE  estimate  of  a  falls  within  the  set  of  feasible 
values. 


6.  Simulation  results  for  small  samples 

In  this  section  we  will  show,  using  Monte  Carlo  simulations,  that  the  relative  ef¬ 
ficiency  of  the  quasi-least  squares  regression  parameter  estimates  can  be  very  high 
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for  small  samples  when  compared  to  the  estimates  obtained  using  the  GEE  method. 
To  make  a  fair  comparison  between  the  two  methods,  we  consider  an  example  where 
a  totally  unspecified  working  correlation  structure  is  appropriate,  since  in  this  case  the 
GEE  method  also  yields  feasible  estimates  for  the  correlation  parameters.  We  will  first 
fit  a  model  using  the  GEE  method  to  a  real-life  data  and  then  make  a  comparison 
between  the  two  methods  using  simulated  data  from  the  fitted  model. 

Consider  the  data  in  Table  3.10  of  Rencher  (1995,  p.  92).  The  data  consists  of  blood 
glucose  levels  on  three  occasions  for  52  women.  The  variable  y  represents  a  fasting 
glucose  measurement  and  the  covariate  x  is  the  glucose  measurement  one  hour  after 
sugar  intake.  We  have  fit  a  simple  linear  regression  model  between  y  and  x  for  the  data 
using  the  GEE  method  with  identity  link  function  (g(u)  = «),  and  totally  unspecified 
working  correlation  structure.  The  regression  line  is  estimated  to  be 

y  =  61.364  +  0.1098x.  (6.1) 

The  estimates  of  the  correlation  matrix  between  the  three  repeated  measurements,  and 
of  the  scale  parameter,  are  given  by 

/  1  0.1971  -0.0122  \ 

R0  =  0.1971  1  0.2081  ],  <pQ  =  76.65.  (6.2) 

\  -0.0122  0.2081  1  ) 

The  regression  coefficients  in  (6.1)  were  highly  significant.  We  will  use  the  model  (6.1) 
and  (6.2),  which  describes  the  relationship  between  the  three  repeated  measurements 
on  y  and  x,  to  compare  the  quasi-least  squares  and  the  GEE  methods. 

To  make  the  comparison  between  the  two  methods,  we  have  simulated  1000  repli¬ 
cations  of  samples  of  three  (f  =  3)  repeated  measurements  on  the  variable  y,  using  the 
x  values  in  Table  3.10  of  Rencher  (1995,  p.  92)  on  m  women  for  m  —  5, 15,52.  The 
simulations  were  performed  using  the  values  of  the  parameters  in  (6.1)  and  (6.2)  and 
a  Guassian  distribution  for  the  errors.  We  then  fit  the  true  model  for  each  replication 
of  the  simulated  data  using  the  quasi-least  squares  and  the  GEE  methods.  For  quasi¬ 
least  squares  we  have  used  the  fixed-point  iteration  scheme  described  in  Example  4.4 
to  estimate  the  correlation  matrix  in  the  iterative  process  for  obtaining  the  regression 
parameter  estimates.  Mean  square  errors  (MSE)  of  the  estimates  of  the  intercept  and 
the  slope  were  computed  using  the  1000  replications. 

Table  1  gives  the  relative  efficiencies  of  the  quasi-least  squares  estimates  with  respect 
to  the  GEE  estimates  of  the  regression  parameters  for  various  values  of  m.  The  relative 
efficiency  is  defined  as  the  ratio  of  the  MSE  computed  from  the  GEE  method  to  that 
of  the  MSE  of  the  quasi-least  squares  method.  We  can  see  from  Table  1  that  the 
relative  efficiency  is  very  high  for  m  =  5,  being  more  than  three  for  both  the  intercept 
and  the  slope.  Further,  the  relative  efficiency  is  decreasing  as  m  increases.  But  note 
that  even  for  moderately  large  samples  (m  =  52,  the  size  of  the  original  sample  in 
Rencher  (1995,  p.  92))  the  relative  efficiency  of  the  quasi-least  squares  is  more  than  1. 
Therefore,  the  quasi-least  squares  approach  is  preferable  not  only  for  small  samples, 
but  for  moderately  large  samples  as  well. 
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Table  1 

Relative  efficiencies  of  the  quasi-least  squares  regression 
parameters  estimates  with  respect  to  the  GEE  estimates. 


m 

Intercept 

Slope 

5 

3.387 

3.183 

15 

1.219 

1.236 

52 

1.057 

1.062 

Table  2 

Bias/(standard  error)  of  the  estimates  of  the  correlation  parameters.  The  numbers  above  diagonal  correspond 
to  the  quasi-least  squares  estimates;  numbers  below  diagonal  are  for  the  GEE  estimates. 


m  —  5 

m—  15 

m  =  52 

* 

0.0137 

(0.5901) 

0.1693 

(0.2850) 

* 

0.0135 

(0.2657) 

0.1575 

(0.2708) 

* 

0.0377 

(0.2715) 

0.1184 

(0.1350) 

* 

0.0112 

(0.1395) 

0.1267 

(0.1369) 

* 

0.0126 

(0.1390) 

0.1030 

(0.0709) 

* 

0.0051 

(0.0743) 

0.1105 

(0.0704) 

0.0144 

(0.5680) 

0.0755  ’ 
(0.5639) 

* 

0.0250 

(0.2817) 

0.0461 

(0.2722) 

* 

0.0128 

(0.1493) 

0.0171 

(0.1378) 

* 

The  biases  and  the  standard  errors  of  the  estimates  of  the  correlation  parameters  are 
contained  in  Table  2.  As  expected,  the  quasi-least  squares  estimates  have  more  bias 
than  those  from  the  GEE  method.  On  the  other  hand,  the  quasi-least  squares  estimates 
of  the  correlation  parameters  have  smaller  standard  errors  and  therefore  are  more  stable 
than  the  GEE  estimates. 
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Appendix  A.  Proof  of  unique  feasible  root  in  Example  4.3 

We  can  verify  that  in  Example  4.3,  the  roots  of  the  quadratic  equation  (4.4)  are  real 
if  and  only  if 

{! :%(!,  + C2)Z,}  ^{gZ/QZ,}2 
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_  ,  ^  [  z'(/m  ®C]>  ]  ,  A ,  N 

&  1  ^  max  —7- - 77 - —  ~t  (Al) 

2^0  [z  {Im  ®  (7/  +  C2 ) }  z  _ 

^  1  ^  P-max{(4  ®  C,)(7m®(7  +  C2))-1}| 

^l^lWCitt  +  Ca)-1}!, 

where  z'  =  (Z{, . . .  ,Z'm)  and  Amax(^)  denotes  the  maximum  eigenvalue  of  A.  The  matrix 
C\(It  +  C2)-1  is  similar  to  a  symmetric  tridiagonal  matrix.  Using  the  strum  sequence 
property  of  tridiagonal  matrices,  we  can  verify  that  all  the  eigenvalues  of  Ci(7,  +  C2)-1 
fall  in  the  interval  [—1,  1].  Therefore  (Al)  holds  and  we  have  established  that 

m  m 

Y.ACxZi  ^ZZ,i(It  +  C2)Zi  (A2) 

1=1  1=1 

for  all  Zf  s.  It  is  easy  to  verify,  using  the  inequality  (A2),  that  the  root  of  Eq.  (4.4) 
that  falls  in  the  interval  (-1, 1)  is  given  by  (4.5)  almost  surely. 

Appendix  B.  Proof  of  Theorem  5.1 

Fix  e  =  (p,a)'.  Let  vi(P*,e)  =  Z'i(P*,P)R-1(«)ZiU3*,p),  where  Zf  (/?*,/?)= 4"  1/2(n 
(Yi  —  Hi(p)).  Now,  for  ||f||  ^K,  0 <K<oo,  under  standard  regularity  conditions,  con¬ 
sidering  a  Taylor  series  expansion  around  6  we  can  write 


MO  =  ZW>e  +  t/m1'2)- vf(/r,0)} 

i=l 

=  -4  tt'  v\us*,e*)t  (Bi) 

w1/2  1=1  2m  ,=1 

where  6*  is  a  point  on  the  line  joining  6  and  8  +  t/m1^2.  The  above  expansion  is  true 
for  any  (/?*,/?, a).  In  particular  for  /?*  =  /?,  we  can  write  (Bl)  as 

WO  =-4e  t>V v'<0)  +  T~  t  t'V2M{8*)t  (B2) 

™  '  1=1  4m  ,=1 


Now,  if  we  let 

wo =-E{^2v,(0*)-^2v,m 

m  1=1 

then  (B2)  can  be  rewritten  as 

«')  -  35  g  ,Vv'<e>  *  hp' vhm + *¥* 

=  Antt,VVi(d)  +  ^-tt,yiWt 


1 


2m  ft 


+  4  E  <'{F2v,(0)  -  J5(A)}f  + 

ZW1  /-.j  -4 
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where  E{V2Vj(9)}  =  Sj(X).  Under  the  assumption  (5.6)  it  follows  from  the  weak  law 
of  large  numbers, 

=  op(l). 

Using  an  argument  similar  to  the  one  found  in  Sen  and  Singer  (1993,  p.  207),  we  can 
show  that  sup^f;  ||f || \rjm(t)\  tends  to  zero  almost  surely  as  m— >oo.  Thus  uniformly 
for  ||/||  <:K  we  have 

l  m  1  m 

MO  =  ^172  E  t'Vvi(B)  +  —  E  +  Op(l).  (B4) 

Disregarding  the  op(l)  term  and  minimizing  the  right-hand  side  of  (B4)  with  respect 
to  t,  we  get  the  point  of  minimum  as 

=  { ^ 5 {^S'7v,(e)}-  (B5> 

From  the  definition  of  £ m(t )  we  can  conclude  that  7m  will  also  correspond  closely  to 
the  quasi-least  squares  estimate  of  6,  which  is  attained  at  6m.  Therefore,  we  have 


It  is  easy  to  verify  that 


E{V\i(Q)}  =  n(X)  and  covt'Fv,^)}  =  i7(X). 


(B7) 


From  (B5)-(B7)  and  the  weak  law  of  large  numbers,  we  get 

9m->e  +  Jf-\X)n(X)  (B8) 

in  probability,  as  m  -*  oo.  It  is  easy  to  check  from  (B8)  that  is  consistent  and  am 
is  asymptotically  biased.  Under  the  assumptions  (5.6),  (5.7),  from  (B7),  (B6),  (B5), 
the  multivariate  central  limit  theorem  and  Slutsky’s  theorem,  we  can  conclude  that 


&  -  + °p  (j^s)  “  AN^-’(AMA),  — 


(OTQ) 

m 


(B9) 


This  completes  the  proof  of  the  theorem. 
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Summary 

Quasi-least  squares  (QLS),  a  marginal  statistical  approach  via  generalized  estimating  equations 
that  is  described  in  the  balanced  data  setting  by  Chaganty  (1997,  Journal  of  Statistical  Planning 
and  Inference  63,  39-54),  allows  for  application  of  a  wide  range  of  working  correlation  structures 
when  analyzing  serially  correlated  data.  We  extend  the  application  of  QLS  to  serially  correlated, 
unequally  spaced,  and  unbalanced  data  using  three  useful  working  correlation  models:  the  first- 
order  autoregressive  (AR(1)),  the  Markov,  and  the  generalized  Markov  structure  described  by 
Nunez- Anton  and  Woodworth  (1994,  Biometrics  50,  445-456).  We  compare  QLS  and  the  original 
formulation  of  the  generalized  estimating  equation  approach  (GEE)  for  these  structures,  demon¬ 
strating  that  (i)  infeasibility  of  the  GEE  correlation  parameter  estimates  can  be  a  problem,  (ii)  it 
is  difficult  to  obtain  consistent  moment  estimates  of  the  correlation  parameters  for  the  generalized 
Markov  structure,  and  (iii)  the  use  of  QLS  can  lead  to  reduced  mean  square  error  of  the  estimate 
of  the  regression  parameter  for  small  samples  of  moderately  correlated  data.  To  choose  between 
alternative  correlation  models,  we  propose  a  criterion  that  is  based  on  the  principle  of  generalized 
least  squares.  Finally,  data  for  which  the  generalized  Markov  structure  is  appropriate  are  analyzed 
to  demonstrate  the  use  of  QLS  in  selecting  a  suitable  working  correlation  structure  and  identifying 
important  covariates. 


1.  Introduction 

In  this  paper,  we  apply  a  statistical  method  based  on  the  generalized  estimating  equation  approach 
of  Liang  and  Zeger  (1986)  to  the  analysis  of  longitudinal  data  that  may  be  difficult  to  analyze  using 
other  established  methods.  We  consider  repeated  measures  data  collected  by  taking  measurements 
of  an  outcome  variable  and  associated  covariates  on  each  of  a  group  of  independent  subjects.  Our 
primary  data  analysis  goal  is  to  identify  important  covariates  and  to  explain  their  effect  on  the 
marginal  mean  of  the  outcome  variable  while  also  accounting  for  the  correlation  among  observations 
on  each  subject. 

Accomplishing  this  research  objective  may  be  difficult  due  to  certain  conditions  that  are  typical 
in  longitudinal  studies.  The  timing  and  total  number  of  measurements  taken  may  vary  from  subject 
to  subject  so  that  the  data  may  be  imbalanced  and  unequally  spaced.  The  outcome  variable  may 
not  be  normally  distributed.  The  intrasubject  correlation  may  be  described  using  a  time-dependent 
pattern.  For  example,  the  correlation  between  two  measurements  may  decrease  as  they  become  more 
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highly  separated  in  time,  or  two  measurements  separated  by  a  fixed  distance  in  time  may  be  more 
highly  correlated  if  they  are  collected  later  during  the  study  rather  than  earlier. 

To  describe  the  marginal  mean  of  the  outcome  variable  as  a  function  of  the  covariates,  one  ap¬ 
proach  is  to  use  the  method  of  generalized  estimating  equations  (GEE),  first  proposed  by  Liang 
and  Zeger  (1986).  The  method  of  GEE  specifies  a  generalized  linear  model  for  the  outcome  variable 
and  models  the  association  among  observations  on  each  subject  via  a  working  correlation  struc¬ 
ture.  Estimation  proceeds  by  alternating  between  solving  a  generalized  estimating  equation  for  the 
regression  parameter  and  consistent  moment  estimation  of  the  correlation  parameter.  Many  recent 
publications  discuss  extensions  and  applications  of  the  GEE  approach,  especially  for  correlated 
binary  data.  In  particular,  Prentice  (1988)  and  Zhao  and  Prentice  (1990)  developed  generalizations 
of  GEE  that  Liang,  Zeger,  and  Qaqish  (1992)  refer  to  as  GEE1  and  GEE2,  respectively.  Desmond 
(1997)  gives  a  good  description  of  GEE,  GEE1,  and  GEE2. 

One  widely  accepted  property  of  the  method  of  GEE  is  that,  if  the  correlation  structure  is 
misspecified,  the  estimates  of  the  regression  parameters  will  nevertheless  remain  consistent.  There 
has  been  some  controversy,  however,  regarding  the  effect  of  misspecification  on  the  efficiency  of 
these  estimates.  [For  a  discussion  of  this  topic,  see  papers  by  McDonald  (1993),  Zhao,  Prentice, 
and  Self  (1992),  Fitzmaurice  and  Laird  (1993),  and  Fitzmaurice  (1995).]  In  any  case,  it  is  intuitively 
reasonable  that  careful  modeling  of  the  correlation  structure  leads  to  improved  estimation  of  the 
regression  and  the  correlation  parameters. 

Modeling  the  correlation  structure  of  the  outcomes  on  each  subject  comprises  (i)  identifying 
reasonable  correlation  structures  for  the  data  under  consideration,  (ii)  implementing  these  struc¬ 
tures  in  an  analysis,  and  (iii)  choosing  among  the  final  sets  of  estimates  associated  with  each  of 
the  different  structures.  Depending  on  our  initial  identification  of  reasonable  working  correlation 
structures,  we  may  be  limited  in  carrying  out  steps  (ii)  and  (iii)  using  the  method  of  GEE.  For 
some  correlation  structures,  implementing  (ii)  is  difficult,  either  because  the  final  GEE  estimates 
of  their  parameters  are  infeasible  or  because  consistent  moment  estimates  of  their  parameters  are 
not  easily  obtained.  In  this  paper,  we  consider  three  successively  generalized  spatial  correlation 
structures  that  are  applicable  to  serially  correlated  data — the  AR(1),  the  Markov,  and  the  general¬ 
ized  Markov  structure  that  was  described  by  Nunez- Anton  and  Woodworth  (1994).  For  the  AR(1) 
and  Markov  structures,  GEE  may  yield  infeasible  final  estimates  of  the  correlation  parameters.  For 
the  generalized  Markov  structure,  consistent  moment  estimates  of  the  parameters  are  not  easily 
obtained  so  that  GEE  is  not  easily  applied  for  this  structure.  Carrying  out  (iii)  using  GEE  may  also 
be  difficult  because  the  method  does  not  provide  a  simple  criterion  for  correlation  model  selection. 

In  contrast  to  the  original  formulation  of  GEE,  QLS  does  provide  a  simple  basis  for  nonasymp- 
totic  comparison  of  different  correlation  structures.  We  suggest  a  criterion  for  correlation  model 
selection  that  is  based  on  the  principle  of  generalized  least  squares.  The  QLS  approach  also  allows 
for  consideration  of  the  AR(1),  Markov,  and  generalized  Markov  correlation  structures  and,  for  a 
continuous  outcome  variable,  the  final  QLS  estimates  of  the  parameters  in  these  structures  will  be 
feasible.  The  goal  of  this  paper  is  to  demonstrate  that,  when  compared  with  the  original  formular 
tion  of  GEE,  QLS  can  improve  our  ability  to  model  the  correlation  in  our  data.  We  also  conduct 
simulations  to  show  that  QLS  can  lead  to  more  efficient  estimation  of  the  regression  parameters. 
Comparisons  between  QLS,  GEE1,  and  GEE2  for  correlated  binary  data  are  planned  as  the  subject 
of  future  research. 

Organization  of  this  paper  is  as  follows.  In  Section  2,  we  establish  notation,  give  a  description 
of  the  method  of  quasi-least  squares,  and  propose  a  criterion  for  correlation  model  selection.  In 
Section  3,  we  discuss  the  AR(1),  Markov,  and  generalized  Markov  correlation  structures,  give  an 
interpretation  of  their  correlation  parameters,  and  discuss  implementation  of  the  method  of  QLS 
for  each  structure.  In  Section  4,  we  describe  simulations  that  compare  QLS  with  GEE  and  a  data 
analysis  that  demonstrates  the  use  of  QLS  in  choosing  an  appropriate  working  correlation  structure 
and  identifying  important  covariates. 

2.  The  Method  of  Quasi-Least  Squares 

2.1  Notation  and  Assumptions 

We  consider  data  comprising  vectors  Y-  =  (t/ii,l/i2,  *  •  ■  >yim)  of  measurements  taken  on  subject  i 
at  times  T[  =  (tn,ti2l . . . ,  tini)}  0  <  tiX  <  ti2  <  •  •  •  <  tinr  Associated  with  each  measurement  yij 
is  a  vector  of  covariates  x^j  =  •  •  ♦  >Zijp);  1  <  3  <  ni>  1  <  *  <  We  assume  that  any 

variability  in  the  spacing  or  number  of  observations  collected  on  each  subject  is  either  the  result 
of  the  study  design  or  of  a  process  that  is  independent  of  the  observed  and  unobserved  data,  i.e., 
missing  values  are  missing  at  random.  We  do  not  assume  a  distributional  form  for  the  outcome 
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variable,  only  that  E(ytj)  =  uy  and  var (yy)  =  Tj«/(uy),  where  tj  >  0  is  either  a  known  constant 
or  an  unknown  parameter.  The  regression  equation  Uij  =  g  l{x[jP)  relates  the  marginal  mean  of 
the  outcome  variable  with  covariates  measured  on  each  subject,  where  /?'  =  •  •  •  > M  1S  a 

vector  of  unknown  regression  coefficients  and  g  is  an  invertible  link  function. 

We  assume  that  observations  taken  on  different  subjects  are  independent.  Taken  on  one 
subject,  they  are  correlated.  The  covariance  matrix  V*  of  observations  on  subject  i  satisfies 
y.  _  (Tii4i)1/2-Ri(p)(^iT*)1/2,  where  A*  =  diag(i/(txii Y i  =  diag(Ti, . . . ,  rn<), 
Ri(p)  is  a  working  correlation  matrix,  and  p  =  (pi,P2,  •  • .  »Ps)  is  a  vector  of  unknown  parameters. 
We  consider  p  to  be  a  nuisance  parameter;  its  estimation  is  carried  out  primarily  to  aid  estimation 

of  0.  Let  U[  =  (*!,...,«*«,)  and  ZM  =  ~  '**«)'■  We  refer  t0  the 

quadratic  form 

q(p,p)  =  f;  zmR-\P)zm  (2i) 

i= 1 

as  the  generalized  error  sum  of  squares. 

2.2  A  Description  of  the  Methods 

Here  we  describe  the  method  of  QLS  and  make  a  brief  comparison  with  the  original  formulation  of 
GEE.  [For  a  more  detailed  description  in  the  balanced  data  setting,  see  Chaganty  (1997).]  QLS  uses 
a  partial  derivative  of  the  generalized  error  sum  of  squares  (2.1)  to  derive  the  following  estimating 

equation  for  /?: 

Y^D'iA~1/2R~l{p)Zi(0)=O.  (2'2) 

i= 1 

The  estimating  equation  for  p  is  obtained  by  differentiating  (2.1)  with  respect  to  p,  yielding 

'  <2-3) 

i= l  P 

To  obtain  QLS  estimates  (p,  0)  for  (p,0),  we  select  a  starting  value  0  for  0  (or  a  starting  value 
5  for  p)  and  then  iterate  between  solving  (2.3)  for  p  (the  rho  step)  and  solving  (2.2)  for  0  (the 
beta  step)  until  the  estimates  of  0  converge.  GEE  also  uses  estimating  equation  (2.2)  for  p  and 
alternates  between  estimation  of  p  and  0,  though  it  requires  the  use  of  m1/2  consistent  estimates 
of  p.  In  practice,  moment  estimates  of  p  that  are  based  on  the  current  values  of  the  standardized 
residuals  (zy)  are  often  used.  If  tj  is  an  unknown  parameter,  it  can  be  consistently  estimated  using 
fj  =  (l/m)££LiZy,  where  zy  is  zy  evaluated  at  0.  If  tj  =  r  for  all  j,  we  use  the  consistent 

estimate  f  =  (1  /n)  ^j=\  * ij >  w^ere  n  ~  ^*=1 n*- 

2.3  Choosing  a  Correlation  Structure 

To  choose  among  competing  structures,  Diggle,  Liang,  and  Zeger  (1994,  p.  145)  suggest  fitting 
different  correlation  models  and  then  comparing  the  corresponding  final  estimates  of  the  regression 
parameter  and  their  standard  errors.  “If  they  differ  substantially,  a  more  careful  treatment  of  the 
covariance  model  may  be  necessary.”  There  are  two  drawbacks  to  this  approach.  First,  it  relies  on 
asymptotic  standard  errors.  As  pointed  out  by  Lindsey  (1993,  p.  68),  decisions  based  on  asymptotic 
standard  errors  may  be  inappropriate  for  small  samples.  Second,  if  more  careful  modeling  of  the 
correlation  structure  seems  necessary,  there  is  ambiguity  as  to  how  one  should  then  proceed  with 

the  analysis.  .  .  r 

In  contrast  to  the  method  of  GEE,  QLS  provides  a  basis  for  nonasymptotic  comparison  of 

different  correlation  structures.  Since  the  method  estimates  p  by  minimizing  (2.1)  with  respect  to 
p  given  several  alternatives,  a  natural  choice  of  correlation  structure  is  the  structure  that  minimizes 
the  generalized  error  sum  of  squares,  with  an  adjustment  for  the  total  number  of  parameters  in 
the  model.  If,  e.g.,  we  are  analyzing  data  in  which  the  correlation  among  measurements  on  each 
subject  is  expected  to  decrease  with  increased  separation  in  time,  we  might  reasonably  apply  the 
structures  considered  in  Section  3.  For  a  given  set  of  covariates  and  a  specified  link  function  and 
mean  variance  relationship,  we  might  obtain  QLS  estimates  of  (p,0)  for  each  structure,  choosing 
as  our  final  correlation  model  the  structure  that  corresponds  to  the  minimum  value  of  the  adjusted 
residual  generalized  sum  of  squares  Qa  =  Q(p,0)/(n  ~  P  ~  ?)>  where  n  =  Li=1  nt.  Our  criterion 
generalizes  to  correlated  data  the  least-squares  approach  of  minimizing  the  residual  sum  of  squares. 
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It  is  analogous  to  Akaike’s  Information  Criterion  (AIC)  and  Schwarz’s  Bayesian  Criterion  (SBC), 
two  criteria  commonly  used  for  covariance  model  selection  in  analyses  of  normally  distributed  data 
(see  Littell  et  al.,  1996,  p.  101). 

Another  approach  that  can  be  used  to  compare  the  fit  of  two  correlation  models  is  construction  of 
an  empirical  semivariogram  (Diggle  et  al.,  1994,  p.  82)  or  the  two  graphical  techniques,  draftman’s 
display  and  parallel  axis  plots,  suggested  by  Dawson,  Gennings,  and  Carter  (1997). 

3.  Three  Useful  Spatial  Correlation  Models 

Here  we  discuss  three  useful  correlation  structures,  interpret  their  parameters,  and  discuss 
implementation  of  QLS  for  each  structure. 


3.1  The  Correlation  Structures 

In  what  follows,  p  =  (pi ,  P2)  —  (Q>  A).  Let  Ri(a,X)  =  [rj-j.],  where  =  a  ,J+1  for  k  >  j, 

r*  =  1,  r*fc  =  rj.  for  fc  <  j,  and  ey  is  a  function  of  (a,  A).  In  this  paper,  we  consider  three 
special  cashes  of  #/(a,A):  the  AR(1),  for  which  ey  =  1  for  all  i  and  j;  the  Markov,  for  which 
ey  =  tij  -  ty_i ;  and  the  generalized  Markov,  for  which 


eij  ~ 


if  A  ^  0;  0  <  t{j— 1  <  tij 
if  A  =  0;  0  <  tij— 1  <  t{j . 


(3.1) 


We  can  easily  verify  that  i**(a,A)  has  a  unique  Cholesky  decomposition  Ti(ay  A)r-(a,  A),  where 
Ti(a,  A)  is  a  lower  triangular  matrix  (see  the  Appendix).  Since  the  fcth  diagonal  element  of  T^a,  A) 
is  (1  —  a26**)1/2  and  Jtj(a,  A)  is  positive  definite  if  and  only  if  all  the  diagonal  elements  of  Fj(a,  A) 
are  positive,  feasible  values  of  a  are  those  values  for  which  1  -  a2eik  is  defined  and  positive  for  all 
k  and  i. 

Bounds  on  a  for  each  structure  are  as  follows.  The  parameter  a  6  (-1, 1)  for  the  AR(1)  structure 
and  also  for  the  Markov  structure  if  the  are  integer  valued.  To  allow  a  to  take  on  negative 
values  for  the  Markov  structure,  if  necessary,  we  could  change  the  time  scale  so  that  the  are 
integers  or  are  suitably  defined  so  that  all  values  of  1  -  a2eik  are  defined  and  positive.  This  may  be 
problematic,  however.  Suppose  that  eij+i  is  odd,  so  that  coTr(yijy  Vij+i)  may  be  either  positive  or 
negative.  Changing  the  time  scale  to  make  eij+\  even  is  equivalent  to  assuming  that  corr(j/i; ,  2/tj+i) 
is  positive.  Since  our  basic  assumptions  regarding  the  model  should  be  invariant  to  choice  of  time 
scale,  we  use  the  Markov  structure  only  when  we  expect  the  intrasubject  correlation  to  be  positive, 
which  is  the  case  in  most  biological  applications.  We  thus  restrict  a  to  the  interval  (0, 1).  For  the 
generalized  Markov,  >  0  for  any  fixed  A  €  (-00,00)  and  we  restrict  a  to  (0,1),  as  for  the 
Markov  structure. 

The  parameters  (a,  A)  have  a  useful  interpretation  for  longitudinal  data  analysis.  For  the 
AR(1)  or  Markov  structure,  the  correlation  between  measurements  on  one  subject  decreases  with 
increasing  difference  in  order  or  timing  of  measurements,  respectively.  The  Markov  structure  is 
appropriate  for  unequally  spaced  observations  and  may  be  used  as  an  alternative  to  imputation 
of  missing  values.  The  generalized  Markov  structure  introduces  an  additional  parameter  A  to  the 
Markov  structure  that  greatly  increases  its  flexibility.  We  first  note  that  the  generalized  model  allows 
for  accelerated,  or  decelerated,  decay  in  the  correlation  between  measurements  for  a  fixed  value  of  a 
since  (t±k -t^)/A  increases  towards  00  as  A  —►  00  and  decreases  towards  zero  as  A  — ►  -00.  (Here  we 
have  assumed  that  (i)  Uj  >  1,  which  can  be  achieved  through  reparameterization  in  the  time  scale,  if 
necessary,  and  (ii)  that  Uj  <  Uk ,  for  all  i,  and  k  >  j .)  This  is  useful  because  one  potential  difficulty 
in  applying  the  Markov  structure  is  that  it  may  force  the  correlations  between  measurements  to 
decrease  too  rapidly  with  increasing  separation  in  time.  Other  researchers,  including  Munoz  et  al. 
(1992),  also  used  parameters  to  dampen  the  correlation  in  the  Markov  structure.  The  generalized 
Markov  model  extends  the  Markov  structure  so  that  the  correlation  between  measurements  is  not 
just  a  function  of  their  separation  in  time  but  also  of  their  time  of  occurrence  in  the  study.  This 
is  because  limA_0(^  “  =  HUk/Uj)  =  ln(l  +  w/Uj)>  where  tik  =  w  +  Uj-  Since,  for  fixed 

w,  limti  ->00  ln(l  4*  w/U j)  —  0,  for  values  of  A  that  are  small  in  absolute  value,  responses  in  the 
outcome  variable  that  are  separated  by  w  time  units  will  be  more  highly  correlated  if  they  are 
observed  later  in  the  study  than  if  they  are  observed  earlier.  This  generalization  will  be  useful  if  we 
are  dealing  with  outcome  variables,  such  as  growth  in  humans,  that  become  more  highly  correlated 
over  time. 
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3.2  Implementing  the  Method  of  Quasi-Least  Squares 

To  simplify  programming  the  beta  step  of  QLS,  we  use  the  Cholesky  decomposition  FI,  (o^A) 
Li(a,X)L'i(a,X)  (see  the  Appendix)  and  an  approach  similar  to  that  described  in  Lindsey  (1993). 

Using  current  estimates  d  and  A,  we  calculate  T[  =  L'i(a,X)Ai  1/2ii  and  S'  =  L'i{a,X)Ai  Di 
for  all  i,  where  e,  =  F<  -  Ut.  We  then  regress  T  =  . . .  ,Tm)'  on  S  =  (S[,S'2, . . . ,  S'm)'  to 

obtain  an  adjustment  that  is  added  to  our  previous  estimate  of  0-  The  estimating  process  ends 
when  this  adjustment  is  approximately  zero. 

To  implement  the  p  step,  we  again  use  the  Cholesky  decomposition  to  reexpress  (2.1)  as 


m  rii 

Q(a,  X,/3)  = 

t=l  j= 2 


ZijZij—i  4-  Zij_i 
1  -  a2e»J 


n»  —  l 

-EEi 

t:ni>2  j= 2 


(3.2) 


For  the  AR(1)  structure,  we  used  differentiation  and  simple  arithmetic  to  obtain  the  following 
unique  point  of  minimum  of  (3.2)  in  the  interval  (—1,1): 


a  = 


Er=!  E +  4-i)  -  ^Ellr  E"=2(zU  -  Egi  Ej=a(%-  +  *u-*)2 

“  2E”iE^2^^-t  ’■ 


(3.3) 


For  the  Markov  correlation  structure,  we  used  a  modified  Newton-Raphson  method  to  minimize 
(3.2)  with  respect  to  a  over  (0,1).  For  the  generalized  Markov  structure,  we  wrote  a  grid  search 
program.  All  programs  were  written  in  STATA. 


4.  Comparison  with  the  Method  of  GEE 

Here  we  use  simulations  to  compare  the  methods  of  GEE  and  QLS  when  the  true  and  working 
correlation  structures  are  both  AR(1)  (Section  4.2),  Markov  (Section  4.2),  and  generalized  Markov 
(Section  4.3). 

4.1  The  Model  for  the  Simulations 

We  consider  a  data  set  collected  according  to  a  two-treatment  cross-over  design  because  incorrectly 
assuming  that  the  observations  are  uncorrelated  in  this  setting  can  lead  to  a  severe  loss  of  efficiency 
^  in  estimation  of  0  (see  Diggle  et  al.,  1994,  pp.  60-61).  The  model  for  our  simulations  is  F  =  Xi0+tx, 
where 

X'=(  1  1  M 

xi2  xt3 ) 


and  0'  =  (0o,0l)'>  *  =  1,2,. ..,8.  The  treatment  sequences  (x,i , Xi2, 1,3)  comprise  all  distinct 
permutations  of  zeros  and  ones  for  i  =  1,2,. . .  ,8.  For  the  AR(1)  structure,  the  measurements 
are  equally  spaced.  For  the  Markov  and  generalized  Markov  structures,  the  vector  of  timings 
(tii,ti2,ta)  >s  given  by  (2,7,8)  for  i  =  1,2, 3, 4,  (2,3,8)  for  i  =  5,6,7,  and  (2,5,9)  for  t  =  8. 
We  allow  the  timings  to  vary  between  subjects  because,  although  many  study  protocols  call  for 
a  common  set  of  measurement  times,  in  practice,  this  goal  is  not  often  achieved.  This  lack  of  a 
common  set  of  timings  means  that,  assuming  a  common  unstructured  correlation  matrix  for  all 
subjects,  as  is  often  done  in  practice,  may  not  be  appropriate.  We  assume  constant  variance,  i.e., 
T  =  T  for  j  =  1, 2, 3.  Correlation  in  the  data  is  induced  by  which  is  assumed  to  be  multivariate 
normal  with  mean  zero  and  covariance  rRi-  We  set  r  =  4  and  (0o,0i)  =  (120,-12.88).  The 
correlation  matrix  Ri  has  AR(1)  structure  in  Section  4.2,  Markov  structure  in  Section  4.2,  and 
generalized  Markov  structure  in  Section  4.3. 

4.2  Comparisons  for  the  AR(1)  and  Markov  Structures 

For  the  AR(1)  correlation  structure,  we  made  our  comparisons  using  the  closed-form  QLS  estimate 
of  a  and  the  following  GEE  estimate  of  a  that  is  used  in  the  “SAS  Macro  for  Longitudinal  Data 
Analysis”  (Groemping,  1994): 

Em  —  1 

-  t=l  2^j=l  zijzij+l  (4  ]\ 

&G  —  /  \  Y'vni  Z2  *  ' 

(n  m  p)  2-/t=i  2^j=i  zij 

where  n  =  n For  the  Markov  correlation  structure,  Liang  and  Zeger  (1986,  example  4)  suggest 
an  ad  hoc  estimate  of  a.  They  first  note  that  E(zijzffc)  =  Substituting  the  current 

estimate  ZijZik  for  E (zijzik)  and  then  taking  logarithms  yields  In (zijzik)  ~  ln(r)  +  hx{a)\tij  - tik\. 
A  natural  estimate  of  In  (a)  is  then  given  by  the  slope  of  the  regression  line  of  In  (ZijZik)  on 
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\tij  —  Uk\‘  (When  programming  the  method,  we  regressed  \n(\zijZik\)  on  | Uj  —  t{k\  since  Z{jZik 
may  take  on  negative  values.)  Infeasibility  is  a  problem  for  this  ad  hoc  estimate,  so  we  used 
a  modified  estimate  that  was  not  so  often  infeasible  during  simulation  runs.  Consider  the  set 
{d\  <  d2  <  **•  <  di}  of  distinctive  values  of  spacings  between  any  two  measurements  on  one 
subject.  Let  sw  be  the  number  of  pairs  (tij,  tik)  such  that  | t{j  —  tik\  =  dw.  Since  E (z{jZik)  =  ra 
if  | tij  —  tifc|  =  dw,  we  estimate  ln(a)  by  the  slope  of  the  regression  line  of  ln(tftu)  on  dw,  where 

Hw  =  ( l/Sw)  ^{(tij,tifc):|ttj-“tjfc|=du,}  *ijzik' 

We  compared  QLS  and  GEE  for  the  Markov  and  AR(1)  structures  with  regard  to  infeasibility  and 
efficiency.  Our  simulations  demonstrate  the  problem  that  GEE  may  have  regarding  infeasibility  of 
its  correlation  parameter  estimates.  When  the  true  and  working  structures  are  AR(1)  and  the  data 
are  equally  spaced,  approximately  10%  of  simulation  runs  yielded  infeasible  final  GEE  estimates 
of  a.  For  the  Markov  structure  and  unequally  spaced  data,  infeasibility  was  a  greater  problem. 
Although  Hw  is  a  superior  estimate  to  ZijZiki  it  may  not  estimate  E {zijzik)  precisely  for  small 
samples,  so  that  the  slope  of  the  regression  line  used  to  construct  the  Markov  GEE  estimates  may 
be  close  to  zero,  resulting  in  an  estimate  of  a  that  is  close  to  one  and  maybe  greater  than  one. 
In  our  simulations,  the  final  GEE  Markov  correlation  parameter  estimates  were  positively  biased 
for  all  values  of  a  and  were  often  infeasible.  In  some  simulation  runs,  over  30%  of  the  final  GEE 
estimates  of  a  were  infeasible.  Because  GEE  is  usually  implemented  using  moment  estimates  of  the 
correlation  parameters,  feasibility  of  these  estimates  is  not  guaranteed.  For  ad  hoc  estimates,  such 
as  the  Markov  GEE  estimate  considered  here,  the  likelihood  of  obtaining  an  infeasible  correlation 
parameter  estimate  may  be  high.  Since  QLS  will  always  yield  feasible  estimates  for  continuous 
outcome  variables,  QLS  might  prove  useful  in  providing  a  feasible  correlation  parameter  estimate 


should  the  method  of  GEE  fail  to  do  so. 

Table  1  contains  the  ratio  of  the  mean  square  error  of  the  QLS  regression  parameter  estimates  to 
the  mean  square  error  of  the  GEE  estimates  when  the  true  and  working  correlation  structures  are 
both  AR(1)  or  both  Markov.  Simulation  runs  that  yielded  GEE  estimates  of  a  that  were  infeasible 
were  not  used  in  the  comparison  and  the  efficacy  of  GEE  is  thus  overstated.  Table  1  shows  that 
QLS  estimates  0  more  efficiently  than  GEE  when  the  intrasubject  correlation  is  small  to  moderate 
(a  <  0.5),  which  is  the  case  in  most  biological  applications.  For  a  >  0.5,  GEE  estimates  0  more 
precisely  than  QLS,  though  it  is  important  to  bear  in  mind  that,  for  all  values  of  a,  GEE  may 
yield  an  infeasible  final  estimate  of  the  correlation  parameter.  The  relative  performance  of  the 
two  methods  for  larger  values  of  a  is  probably  due  to  properties  of  the  moment  estimates  used 
by  GEE  to  estimate  a  for  the  AR(1)  and  Markov  correlation  structures,  which  had  lower  mean 
square  error  for  values  of  a  «  1.  Simulations  conducted  by  the  authors  also  confirmed  what  Diggle 
et  al.  (1994,  p.  60-61)  observed:  incorrectly  assuming  that  the  outcomes  are  uncorrelated  in  this 
setting  leads  to  inefficiency  in  estimation  of  0y  especially  as  the  intrasubject  correlation  increases 
in  value.  Simulations  were  also  performed  for  sample  sizes  16,  32,  64,  and  128.  Even  for  the  larger 
samples,  QLS  outperformed  GEE  in  terms  of  mean  square  error  for  a  <  0.5.  However,  the  gain  in 
performance  decreased  with  increasing  m. 


4.3  Comparisons  for  the  Generalized  Markov  Structure 

To  demonstrate  the  difficulties  involved,  we  attempt  to  extend  Liang  and  Zeger’s  (1986)  ad  hoc 
approach  to  estimation  of  a  for  the  generalized  Markov  structure.  For  simplicity,  assume  that  each 
of  m  subjects  has  measurement  times  that  are  a  subset  of  a  common  set  of  measurement  times 
{ti  <  $2  <  •  ■  ■  <  *n}-  Now  consider  the  set  of  spacings  {^(A)  =  (tk  -  tj  )/A;  1  <  j  <  k  <  N}. 


Table  1 

Efficiency  of  QLS  regression  parameter  estimate  with  respect  to 
the  GEE  estimate  for  the  AR(1)  and  Markov  correlation  structures 


Group  parameter  Constant  parameter 

AR(1)  Markov  AR(1)  Markov 


0.1 

1.12 

1.75 

0.3 

1.07 

1.42 

0.5 

0.98 

1.11 

0.7 

0.84 

0.87 

0.9 

0.63 

0.71 

1.07 

1.29 

1.03 

1.14 

0.99 

1.04 

0.96 

0.96 

0.96 

0.96 

1628 


Biometrics,  December  1998 


Table  2 


MSEs  of  the  regression  parameter  estimates  obtained  using  QLS  with  three  correlation  structures 


A 

Group  parameter 

Constant  parameter 

Identity 

Markov 

Generalized 

Markov 

Identity 

Markov 

Generalized 

Markov 

-5 

2.61 

0.09 

0.05 

2.54 

1.90 

1.89 

-3 

2.63 

0.31 

0.31 

2.34 

1.74 

1.74 

1 

2.61 

2.13 

2.10 

1.62 

1.47 

1.46 

3 

2.61 

2.65 

2.65 

1.31 

1.30 

1.32 

Let  Uijk  —  1  if  subject  i  has  measurement  times  tj  and  t and  let  n ^  =  0  otherwise.  Let 
Sjk  =  E£Li  nijk  and  Hjk  =  (1  /sjk)  E{<:ny)k=i}*ij  zik.  Since  E (zijzik)  =  Tar,’kW ,  we  can  estimate 

(a,  A)  using  nonlinear  regression  between  H,k  and  raT,Jfc Clearly,  this  procedure  does  not  yield 
simple  feasible  and  consistent  estimates  for  the  correlation  parameters.  The  generalized  Markov 
structure  is  thus  not  easily  applied  using  GEE. 

We  conducted  simulations  according  to  the  model  described  in  Section  4.1,  in  which  the  actual 
correlation  structure  is  generalized  Markov.  Table  2  contains  the  mean  square  error  (MSE)  of 
the  QLS  estimates  of  the  regression  parameters  for  a  =  0.6  and  for  various  values  of  A,  when 
the  working  correlation  structure  is  the  identity,  Markov,  and  generalized  Markov.  Note  that,  for 
A  =  3,  the  independence  model  performs  well.  This  is  appropriate  because,  in  this  case,  each 
coTi(yij,yik)  «  0.  Table  2  indicates  that  correctly  specifying  the  generalized  Markov  structure 
reduces  the  mean  square  error  so  that,  in  addition  to  including  an  extra  parameter  that  aids  in  our 
interpretation  of  the  data,  this  more  general  structure  also  allows  for  more  precise  estimation  of  /?. 

4.4  Example 

Here  we  apply  QLS  in  an  analysis  of  a  data  set  that  contains  varying  numbers  of  unequally 
spaced  measurements  per  subject  to  demonstrate  use  of  the  method  to  select  an  appropriate 
correlation  structure  and  to  identify  potentially  important  covariates.  The  data  we  consider  (see 
Nunez-Anton  and  Woodworth,  1994,  Figure  3)  were  collected  during  a  study  designed  to  compare 
different  cochlear  prostheses  implanted  in  a  group  of  postlingually  deafened  adults  (the  Iowa 
Cochlear  Implant  (ICI)  Project;  Gantz  et  al.,  1988).  The  outcome  variable  is  the  percentage  of 
correct  responses  on  a  sentence  recognition  test  that  was  administered  at  1,  9,  18,  and  30  months 
postimplantation.  Covariates  include  time  of  measurement  and  type  of  implant  (A  or  B).  Due  to 
loss  of  follow-up,  incomplete  data  were  available  on  treatment  groups  0  and  1,  comprising  23  and 
21  subjects  who  were  implanted  with  protheses  A  and  B,  respectively.  To  determine  if  there  is 
a  difference  in  test  scores  over  time  between  the  two  treatment  groups  we  used  QLS  to  fit  the 
following  model  to  the  full  data  set: 

E(t/ij)  =  A)  +  A  tij  +  (htij  +  th*i\  vzr{yij)  =  t;  corr(li)  =  R^a,  A), 

where  yij  is  the  percentage  correct  for  test  j  on  subject  t,  is  the  group  indicator  variable,  Uj 
is  the  month  in  which  measurement  y^  was  made  for  j  =  1, 2, . . . ,  ti*  and  i  =  1, 2, . . . ,  46.  We 
consider  several  working  correlation  structures  for  Ri(a ,  A),  including  the  identity,  AR(1),  Markov, 
and  generalized  Markov. 

Nunez- Anton  and  Woodworth  (1994)  fit  the  above  model  to  data  on  subjects  who  attained  at 
least  a  5 %  improvement  over  baseline.  After  confirming  multivariate  normality  for  the  outcome 
variable  in  this  subset  of  the  data,  they  carried  out  their  analysis  using  the  REML  approach 
discussed  in  Harville  (1974).  To  provide  motivation  for  using  an  alternative  approach,  we  consider 
the  full  data  set,  which  may  not  be  normally  distributed,  as  indicated  by  an  apparent  lack  of 
normality  in  the  test  scores  at  1  and  9  months. 

Table  3  contains  the  regression  and  correlation  parameter  estimates  for  the  working  correlation 
structures  in  Section  3.1.  According  to  the  criterion  proposed  in  Section  2.3,  the  generalized  Markov 
structure  is  the  appropriate  structure  for  these  data  since  it  corresponds  to  the  minimum  value  of 
the  adjusted  residual  generalized  sum  of  squares  Qa  =  Q{&,  a)/(n  -  p  -  q).  We  also  note  that  Qa 
may  be  used  as  a  rough  guide  to  covariate  selection  for  the  final  model.  For  example,  if  we  delete 
tfj  from  the  model,  Qa  =  473.00  under  an  assumption  of  generalized  Markov  correlation  structure, 
which  represents  an  approximately  8%  increase  over  its  value  in  the  original  model.  This  indicates 
that  tfj  may  be  an  important  covariate  to  retain  in  the  final  model.  We  also  note  that  fitting  the 
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Table  3 

Regression ,  correlation  parameter  estimates,  and  adjusted 


residual  generalized  sum  of  squares  for  ICI  Project  data 


Working  correlation  structure 

00 

Pi 

h 2 

03 

a 

A 

Qa 

Identity 

12.97 

2.31 

-0.05 

9.40 

— 

— 

761.23 

AR(1) 

11.58 

2.27 

-0.04 

11.00 

0.64 

— 

445.43 

Markov 

11.58 

2.32 

-0.05 

10.73 

0.95 

— 

456.91 

Generalized  Markov 

12.15 

2.12 

-0.04 

11.29 

0.84 

0.39 

439.50 

generalized  Markov  correlation  structure  allows  us  to  infer,  as  did  Nunez-Anton  and  Woodworth 
(1994),  that,  since  A  «  0,  test  scores  on  one  subject  tend  to  stabilize  over  time.  Inclusion  of  A  in 
the  generalized  Markov  structure  thus  aids  in  our  interpretation  of  the  data. 
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Resume 

Les  quasi  moindre  carres  (QLS),  une  approche  statistique  marginale  via  les  equations  d’estimation 
generalisees  qui  sont  decrites  dans  la  situation  de  donnees  equilibrees  par  Chatangy  (1997,  J. 
Statist  Plann.  Inference  63,  39-54)  permettent  Putilisation  d’un  grand  eventail  de  structures 
de  correlation  de  travail  quand  on  analyse  des  donnees  presentant  une  correlation  serielle.  Nous 
etendons  Papplication  des  QLS  a  des  donnees  presentant  une  correlation  serielle,  espacee  de  fagon 
inegale,  et  non  equilibrees  en  utilisant  trois  modeles  utiles  de  correlation  de  travail:  l’auto-regressif 
de  premier  ordre  (AR(1)),  le  Markov,  et  la  structure  de  Markov  generalisee  decrite  par  Nenez- 
Anton  et  Woodworth  (1994,  Biometrics  50,  445-456).  Nous  comparons  QLS  et  la  formulation 
originale  des  equations  d’estimation  generalisees  (GEE)  pour  ces  structures,  demontrant  que:  (i) 
Pabsence  de  solution  aux  estimations  des  paramfetres  de  correlation  peut  etre  un  probleme;  (ii) 
il  est  difficile  d’obtenir  des  estimations  consistantes  des  moments  des  parametres  de  correlation 
pour  la  structure  de  Markov  generalisee;  (iii)  Putilisation  de  QLS  peut  aboutir  a  une  reduction  de 
Perreur  moyenne  sur  l’estimation  des  parametres  de  regression  pour  des  petits  echantillons  avec 
des  donnees  moderement  correlees.  Pour  choisir  entre  les  modeles  de  correlation  alternatife,  nous 
proposons  im  critfere  qui  est  base  sur  le  principe  des  moindres  carres  generalises.  Finalement,  des 
donnees  pour  lesquelles  la  structure  de  Markov  generalisee  est  appropriee  sont  analysees,  pour 
demontrer  Putilisation  de  QLS  dans  le  choix  d’une  structure  de  correlation  de  travail  adaptee  et 
dans  Pidentification  des  covariables  importantes. 
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Appendix 


The  generalized  Markov  structure  Ri(a, A)  has  a  unique  Cholesky  decomposition  given  by 
ri(a,A)r'(a,A),  where  r,(a,A)  =  [7]*]  *  a  lower  triangular  matrix  and 

1  if  j  =  1;  A:  =  1 

aeij+ei3+-+eii  if  k  =  1;  j  =  2, . . . ,  nj 

\/l  —  a2eik  if  fc  =  j;  j  =  2,. . .  ,n<  (51) 

aeik+i+-+etj  y/l  _  a2eik  if  k  <  j;  j  =  2, . . .  ,Wi 

0  otherwise. 

Its  inverse,  R~l{a,  A),  is  a  symmetric  tridiagonal  matrix  with  unique  Cholesky  decomposition 
Li(a,X)Li(a,X),  where  L,(a,  A)  =  [l'jk]  and 


(l/\/l  —  a2e<->+1 
-ae<i  j  \J\  —  a2ei’ 

1 


0 


if  k  =  j;  j  =  1,. . (n*  -  1) 
if  k  =  j  -  1;  j  =  2, . . .  ,n* 
if  k  =  j\  j  =  ni\ 
otherwise. 


(5.2) 
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SUMMARY 

In  this  paper  we  obtain  some  inequalities  for  quadratic  forms  involving  a  symmetric  matrix 
and  a  positive  semidefinite  matrix.  As  special  cases  of  those  inequalities,  we  deduce  several  known 
inequalities  that  are  useful  for  the  detection  of  outliers  in  statistical  data  analysis.  We  also  extend 
Scheffe’s  5-method  of  construction  of  simultaneous  confidence  intervals  for  the  case  where  the  design 
matrix  is  not  of  full  rank  and  the  set  of  estimable  functions  are  linearly  dependent. 
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1.  INTRODUCTION 

In  a  recent  article,  Olkin  (1992)  presented  an  interesting  survey  of  several  inequalities  that  are 
useful  in' the  detection  of  outliers  in  statistical  data  analysis.  In  this  paper  we  prove  some  genera 
inequalities  concerning  two  quadratic  forms  and  deduce  most  of  the  inequalities  m  Olkin  (1992) 
as  special  cases.  As  another  application  to  our  theorems,  we  extend  the  Scheffe’s  5-method  of 
constructing  simultaneous  confidence  intervals  for  the  case  where  the  design  matrix  is  not  of  u 
rank  and  the  set  of  estimable  functions  are  linearly  dependent.  The  organization  of  this  paper  is  as 
follows.  In  Section  2  we  present  the  main  theorems  of  this  paper.  Section  3  contains  the  statistic 

applications. 

2.  MAIN  RESULTS 

We  start  with  the  following  elementary  lemma,  stated  here  without  proof  since  it  is  well  known. 
It  plays  an  important  role  in  the  proofs  of  the  theorems  in  this  paper. 
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LEMMA  2.1  Let  Cnxk  o.nd  D*xn  be  two  matrices.  Assume  that  n  >  k.  Then  (n  -  k)  eigenvalues 
of  the  matrix  CD  are  zero  and  the  remaining  k  eigenvalues  of  CD,  some  of  which  may  be  zero, 
coincide  with  the  k  eigenvalues  of  the  matrix  DC. 


We  will  now  develop  some  preliminaries  before  stating  the  main  theorems  of  this  paper.  Let  A 
be  a  symmetric  matrix  of  order  n,  B  be  a  symmetric  positive  semidefinite  matrix  of  order  n  and  rank 
equal  to  k.  Let  M( B)  denote  the  column  space  of  B.  Let  B  =  LnxtLJ.xn  be  the  rank  factorization 
ofB.  Let 

R  =  L(L'L)-1 

A*  =  R'AR  (2.1) 

Note  that  the  Moore-Penrose  inverse  of  B  (see  Searle  (1982),  page  220),  is  given  by  B+  =  RR'. 
Observe  that  the  column  spaces  of  B,  B+,  R  and  L  are  all  equal.  Let  (Ai  >  A2  >  •  •  •  >  A*}  be  the 
ordered  set  of  eigenvalues  of  A*.  Applying  Lemma  2.1  for  C  =  R  and  D  =  R' A  we  can  see  that 
the  set  of  eigenvalues  of  the  matrix  B+  A  is  given  by  {At  >  A2  >  •••  >  A*,0,  ...,0}.  It  is  possible 
that  some  of  the  A |s  may  equal  zero  and  also,  all  the  A,  ’s  may  be  negative.  Therefore  Ai  need  not 
be  the  largest  eigenvalue  of  B+  A.  Similarly,  A*  need  not  be  the  smallest  eigenvalue  of  B+  A.  In 
fact  the  largest  eigenvalue  of  B+  A  is  given  by  max{0,  At}  and  the  smallest  eigenvalue  of  B+  A  is 
equal  to  min{0,  A*}.  We  are  now  ready  to  state  an  inequality  concerning  two  quadratic  forms. 

THEOREM  2.2  Let  A  be  a  symmetric  matrix  of  order  n.  Let  B  be  a  symmetric  positive  semidefinite 
matrix  of  order  n  and  rank  equal  to  k.  Let  {Ai  >  A2  >  •  •  •  >  At,  0, . . .,  0}  be  the  set  of  n  eigenvalues 
of  B+A.  Then 

A*  y'B  y  <y'Ay  <  Aj  y'By  (2.2) 

for  ally  6  M(B).  There  exists  an  eigenvector  y,-  o/B+A  corresponding  to  the  eigenvalue  A,-  such 
that  y ,•  6  jVf(B)  for  1  <  »  <  k.  Further,  equality  holds  in  the  first  inequality  and  in  the  second 
inequality  of  (2.2)  if  we  choose  y  to  be  equal  to  y*  and  yi  respectively. 

Proof.  Let  B  =  LL'  be  the  rank  factorization  of  B.  Let  R  and  A*  be  as  defined  in  (2.1).  Since  Aj 
and  A*  are  the  largest  and  the  smallest  eigenvalues  of  A*,  by  a  well  known  inequality  (see  (lf.2.1) 
of  Rao  (1973),  page  62)  we  have 

Ajfev'v  <  v'A* v  <  Ai  v'v  (2.3) 

for  till  v  6  Si*.  Let  y  be  a  vector  in  -M (B)  and  let  v  =  L'y .  It  is  easy  to  verify  that  y  =  Rv,  since 

y  is  also  in  the  column  space  of  L.  Thus  we  have 

v'v  =  y'By 

v'A’  v  =  y'Ay.  (2.4) 

The  assertion  (2.2)  now  follows  from  (2.3)  and  (2.4).  We  now  proceed  to  show  that  the  two  in¬ 
equalities  in  (2.2),  become  equalities  for  appropriate  choices  of  y.  For  1  <  i  <  k,  let  v,  yk  0  be  an 
eigenvector  of  A*  corresponding  to  the  eigenvalue  A,-  and  let  y,  =  Rv,-.  Note  that  y,  yk  0,  since  R 
is  of  full  column  rank  and  v,-  ^0.  We  also  have 

B+Ay,  =  RR'ARvj 

=  R  A*  Vi  =  A,-  R  v,- 
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=  A,yt-.  (2.5) 

Thus  y,-  is  an  eigenvector  of  B+  A,  corresponding  to  the  eigenvalue  A*.  Clearly  y,-  €  -M(B)  since  it 
is  in  the  column  space  of  R.  Therefore  from  (2.4)  we  have  vj  A’  v,  =  yj  Ay*  and  vj  v,-  =  y,-  B  y,-. 
Thus  y •  Ay,-  =  A,y<By«  for  all  1  <  i  <  k.  Therefore  the  first  and  second  inequalities  in  (2.2) 
become  equalities  if  we  choose  y  to  be  equal  to  y*  and  yi  respectively.  This  completes  the  proof  of 
Theorem  2.2.  □ 

In  the  case  where  Ai  =  •••  =  At,  from  (2.2)  we  have  y'Ay  =  Aiy'By  for  all  y  6  Af(B). 
The  following  example  shows  that  this  need  not  be  true  for  vectors  y  which  are  not  in  M{ B).  The 
example  also  shows  that  (2.2)  need  not  be  true  for  vectors  y  not  in  A4(B). 


Example  2.3  Let  A  = 


and  B  = 


1/4  1/4  \ 
1/4  1/4  )■ 


Clearly,  B  is  positive  semidefinite 


matrix  of  rank  k  =  1 .  The  Moore-Penrose  inverse  of  B  is  given  by  B+  =  ^  ^  ^  J .  It  is  easy  to 

verify  that  the  set  of  eigenvalues  of  B+  A  is  given  by  {1,  0}  and  therefore  Ai  equals  1.  Consider  the 
vector  y'  =  (2,  0)  which  is  not  in  the  column  space  of  B.  A  little  calculation  shows  that  y'By  =  1 
and  y'Ay  =  4  and  hence  y'Ay  >  Ajy'By.  Similarly,  for  y'  =  (0,  2)  we  have  y'Ay  <  Ai  y'By. 
Therefore  this  example  shows  that  the  inequalities  in  (2.2)  need  not  hold  for  all  y. 


The  next  theorem  gives  sufficient  condition  for  the  inequality  (2.2)  to  hold  for  all  y  €  3Rn  • 

THEOREM  2.4  Let  A.  be  a  symmetric  matrix  of  order  n.  Let  B  be  a  symmetric  positive  semidefinite 
matrix  of  order  n  and  rank  equal  to  k.  Lei  {Ai  >  A2  >  >  A*,0,  ...,0}  be  the  set  of  n  eigenvalues 

o/B+  A.  If  A)  C  Af(B)  then  the  matrices  Ai  B  —  A  and  A  —  A*  B  are  positive  semidefinite. 

Proof.  Fix  y  €  S?" .  Then  we  can  write  y  =  y»  +  yjf- ,  where  yj  is  the  projection  of  y  onto  the 


column  space  of  B  and  y/-  =  y  -y».  Note  that  Byjf-  =  0.  If  M{ A)  C  A4(B) 
Therefore, 

we  also  have  Ayb  =  0. 

y'Ay  =  y'bAyt 
y'By  =  y»Byt. 

(2.6) 

Since  y*  €  M{B)  by  (2.2)  of  Theorem  2.2  we  have 

A*  y»Byt  <  y^Ayj  <  Ai  y^By*. 

(2.7) 

Combining  (2.6)  and  (2.7)  we  get 

At  y'By  <  y'Ay  <  Ai  y'By. 

(2.8) 

Since  y  £  3?”  is  arbitrary,  (2.8)  shows  that  the  matrices  Ai  B  —  A  and  A  —  A*  B  are  positive 
semidefinite.  □ 

The  next  lemma  shows  that  for  any  two  symmetric  matrices  A  and  B,  if  M(A)  C  M(B)  then 
the  set  of  eigenvalues  of  B“A  is  invariant  of  the  choice  of  the  g-inverse  B-  of  B.  Thus  we  can 
replace  B+  by  any  g-inverse  B~  of  B  in  the  statement  of  Theorem  2.4. 
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Lemma  2.5  Let  A  and  B  be  two  symmetric  matrices  both  of  order  n.  If  Af(A)  C  -M(B)  then  the 
set  of  eigenvalues  of  B“  A  is  invariant  of  the  choice  of  the  g-inverse,  B“,  of  B. 

Proof.  Let  A  and  B  be  two  symmetric  matrices  of  order  n  such  that  M{ A)  C  -M(B).  By  spectral 
decomposition,  there  exists  orthogonal  matrix  P  such  that 

A  =  P  (  o  O  )  p,  =  PiAPt 

where  A  is  a  diagonal  matrix,  O  is  a  null  matrix  and  P  =  [Pi  P2]  is  the  partition  of  P,  depending  on 
the  rank  of  the  matrix  A.  Since  Af(A)  =  Af(Pi)  and  M( A)  C  Af(B),  we  have  Af(Pi)  C  Ad(B). 
Hence  we  can  write  Pi  =  B  U  for  some  matrix  U.  Therefore, 

A  =  PiAP/i  =  BUAU/B  =  BVB  (2.9) 

where  V  =  U  AU',  is  a  symmetric  matrix.  Let  B“  be  a  g-inverse  of  B.  If  we  choose  C  =  B‘  BV 
and  D  =  B,  then  by.  Lemma  2.1  we  have  that  the  set  of  eigenvalues  of  B"  A  is  exactly  same  as  the 
set  of  eigenvalues  of  the  matrix  BV.  Thus,  the  eigenvalues  of  B”  A  do  not  depend  on  the  choice  of 
the  g-inverse  B-,  of  B.  This  completes  the  proof  of  the  lemma.  □ 

The  following  example  shows  that  the  conclusion  of  Lemma  2.5  need  not  be  true  if  we  do  not 
assume  that  X(A)  is  contained  in  A4(B). 

Example  2.6  Consider  the  matrices  A  and  B  as  in  Example  2.3.  It  is  easy  to  verify  that  M( A)  is 
not  contained  in  X(B).  We  have  seen  that  in  Example  2.3  the  set  of  eigenvalues  of  B+ A  is  given 

by  {1,  0}.  Consider  another  g-inverse  B“  =  ^  ^  q  j  ,  of  B.  We  can  easily  verify  that  the  set  of 

eigenvalues  of  B"  A  is  given  by  {2,  0},  which  is  different  from  the  set  of  eigenvalues  of  B+A.  Thus, 
the  conclusion  of  Lemma  2.5  need  not  be  true  if  Af(A)  is  not  contained  in  Af(B). 

Let  b  be  a  vector  in  SR".  Theorems  2.2  and  2.4  restricted  to  the  matrix  A  =  bb'  give  rise  to 
several  interesting  inequalities.  We  treat  this  special  case  in  Theorem  2.7  below.  In  Section  3  we  will 
use  Theorem  2.7  to  prove  several  inequalities  that  are  useful  in  the  detection  of  outliers  in  statistical 
data  analysis. 

THEOREM  2.7  Let  B  be  a  symmetric  positive  semidefinite  matrix  of  order  n.  Let  B+  be  the  Moore - 
Penrose  inverse  of  3.  Let  M( B)  denote  the  column  space  of  B.  7/b  is  an  n  x  1  vector  then 

(b'y)2  <  b'B+b  y'By  (2.10) 

for  ally  €  Af(B).  Moreover,  equality  holds  in  (2.10)  if  we  choose  y  =  B+b.  Also,  if  rank  of  B 
equals  1,  then  equality  holds  in  (2.10)  for  all  y  €  M( B).  7/b  €  M(B)  then  (2.10)  holds  for  all 
y  €  SR",  equivalently,  the  matrix  (b'B+b)  B  -  bb'  is  positive  semidefinite. 

Proof.  Let  b  be  an  n  x  1  vector.  Let  us  choose  A  =  bb'  in  Theorems  2.2  and  2.4.  Letting 
C  =  B+b  and  D  =  b'  in  Lemma  2.1  we  can  see  that  the  set  of  eigenvalues  of  B+ A  is  given  by 
{b'B+b,0, ...,0}.  Let  the  rank  of  B  be  equal  to  k.  Then,  in  the  notation  of  Theorem  2.2,  the 
eigenvalue  Aj  equals  b'B+b  and  A*  =  Aj  if  *  =  1  and  A*  =  0  if  k  >  2.  Therefore  Theorem  2.7 
follows  from  Theorems  2.2  and  2.4.  □ 

The  following  Corollary  2.8  is  an  easy  consequence  of  Theorem  2.7.  We  will  apply  this  corollary 
in  Section  3,  to  extend  Scheffe’s  5-method  of  constructing  simultaneous  confidence  intervals,  when 
the  design  matrix  is  not  of  full  rank  and  the  set  of  estimable  functions  are  linearly  dependent. 
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COROLLARY  2.8  Let  B  be  a  symmetric  positive  semidefinite  matrix  of  order  n  and  B~  be  a  g-inverse 
of  B-  Let  b  €  Ad(B)  then  rjB  -  bb'  is  positive  semidefinite  if  and  only  if-q  >  b'B-  b. 

Proof.  Let  b  €  M(B).  It  is  easy  to  verify  that  b'B~b  =  b'B+  b  for  any  choice  of  the  g-inverse 
B" ,  of  B.  Suppose  that  r?  >  b'B+  b,  then  from  Theorem  2.7  we  have 


*?y'By  >  b'B+b  y'By>y'bb'y 


(2.11) 


for  all  y  £  3?  .  Therefore  tj  B  b  b'  is  positive  semidefinite.  The  other  implication  follows  easily,  if 
we  choose  y  =  B+  b.  □ 


Theorem  2.7  is  essentially  asserting  that  if  B  is  a  positive  semidefinite  matrix  and  b  6  9?n  then 


sup 

y  €  M{B) 
y*o 


(Vy)2 

y'By 


b'B+b. 


(2.12) 


This  generalizes  the  result  contained  in  (11),  Appendix  A4  of  Seber  (1977),  where  the  above  equality 
(2.12)  was  obtained  for  positive  definite  matrix  B.  Note  that  if  A  is  a  symmetric  matrix  and  B  is 
a  positive  semidefinite  matrix  then  the  conclusion  of  Theorem  2.2  can  be  restated  as 


sup 

y  6  M{ B) 
y*o 


inf  zAr 
y  €  Ad( B)  y'By 

y*o 


=  A* 


(2.13) 


where  Ai  and  A*  are  the  eigenvalues  of  B+A  as  defined  in  Theorem  2.2.  A  similar  representation  is 
also  true  for  the  other  eigenvalues  Ay,  2  <  p  <  (A  —  1)  and  is  given  by  the  following  theorem. 


Theorem  2.9  Let  A  be  a  symmetric  matrix  of  order  n.  Let  B  be  symmetric  positive  semidefinite 
matrix  of  order  n  and  rank  equal  to  k.  Let  {X i  >  •  •  •  >  A*,  0, . .  .,0}  be  the  set  of  n  eigenvalues 
of  B+  A.  Then  there  exist  eigenvectors  {yi,...,yt}  o/B+A,  corresponding  to  the  eigenvalues 
{Ai,...,A*}  such  that  yieM(B),y'iByj=0,  1  <i^j<k.  Further 


sup 


y' Ay 


=  A„ 


{y € bp,  y  qk  o}  y'fiy 
where  Bp  =  {y  €  M(B)  :  y{By  =  0,  1  <  i  <  (p -  1)},  for  2  <  p  <  (Jt  -  1). 


(2.14) 


Proof.  Let  A  be  a  symmetric  matrix  and  B  be  a  symmetric  positive  semidefinite  matrix  of  rank 
equal  to  k  and  B+  denote  the  Moore-Penrose  inverse  of  B.  Let  L  and  R  and  A*  be  as  defined  in 
(2.1).  Let  {Ai  >  •  •  •  >  A*}  be  the  set  of  ordered  eigenvalues  of  A*  and  vx , . . . ,  v*  be  corresponding 
orthogonal  eigenvectors.  By  Theorem  1  of  Bellman  (1970),  page  113,  we  have 


sup 

{veft*  :  v-v  =  0} 

v*°  1<«<(P-1) 


v' A*  v 
v' V 


—  A p 


for  2  <  p  <  (*  -  1). 


(2.15) 


Let  y  R  v,  then  as  v  varies  in  3J  ,  the  vector  y  varies  in  Ad (B)  and  by  (2.4)  we  have  v  —  y/  B  y 
and  v'A*v  =  y'Ay.  Let  us  define  y<  =  Rv,-  for  1  <  i  <  k.  Then  by  Theorem  2.2,  y<  is  the 
eigenvector  of  B+  A  corresponding  to  the  eigenvalue  A,-.  Further  y'  B  y;  =  0,  since  v,-  =  L'y,  and 
vf v;  =  0  f°r  1  ^  ^  j  <  k.  The  identity  (2.14)  now  follows  from  (2.15).  □ 
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3.  STATISTICAL  APPLICATIONS 

In  this  section  we  present  some  applications  of  the  theorems  in  Section  2.  Our  first  application 
deals  with  some  inequalities  that  are  useful  for  the  detection  of  outliers  in  statistical  data.  As  a 
second  application  we  extend  Scheffe’s  5-method  of  constructing  simultaneous  confidence  intervals, 
when  the  design  matrix  is  not  of  full  rank  and  the  set  of  given  estimable  functions  are  linearly 
dependent. 

Application  3.1  In  a  recent  paper  Olkin  (1992)  considered  the  following  problem,  that  is  of  interest 
in  the  detection  of  outliers.  Given  the  mean  and  standard  deviation  of  a  finite  sample,  find  the 
maximum  deviation  of  any  particular  observation  from  the  sample  mean  as  a  multiple  of  the  sample 
standard  deviation.  More  specifically,  let  {yi, . . . , y„}  be  a  sample  of  n  observations.  The  problem 
is  to  find  the  minimum  value  of  c  such  that 

(if*  -y)2  <c^(y,-  -y)2,  k=l,...,n,  (3.1) 

i=i 


where  y  =  £"=1  y,/n,  is  the  sample  mean.  The  above  problem  and  its  solution  that  the  best  value 
of  c  equals  (n  —  l)/n  was  first  brought  into  the  limelight  of  statistics  by  Samuelson  (1968).  The 
inequality  (3.1)  with  c  ss  (n  -  1  )/n  is  now  popularly  known  as  Samuelson’s  inequality.  Olkin  (1992) 
gave  an  interesting  survey  of  the  known  proofs  of  Samuelson’s  inequality  and  raised  the  question 
whether  there  is  room  for  yet  another  proof.  He  then  gave  a  new  proof  with  some  generalizations. 
We  now  show  that  Samuelson’s  inequality  and  several  other  inequalities  in  Olkin  (1992)  follow  from 
our  theorems  of  Section  2. 


Let  e'  =  (1, ...» 1)  and  B  =  In  —  jee',  where  L»  is  the  identity  matrix.  Note  that  B  is  symmetric, 
idempotent  matrix.  Hence  B  is  positive  semidefinite  and  B+  =  B.  Fix  1  <  k  <  n.  Consider  the 
vector  bi,  where  the  jth  component  is  given  by 


f  1  -  (Vn)  if  J-k 

\  -1/n  if j^k 


(3-2) 


Since  b'x  e  =  0  we  have  bi  €  Af(B).  Also,  bj  B+  bi  =  ty  bi  =  (n  —  1  )/n.  If  we  choose  b  =  bi,  by 
the  last  assertion  of  Theorem  2.7  we  have  ((n  —  l)/n)B  — bb'  is  positive  semidefinite.  Thus  for  any 
y  €  9in  we  get 

^y'By  >  y'b^y  (3.3) 

n 

which  is  equivalent  to  Samuleson’s  inequality: 


*=1 


(3.4) 


In  a  similar  fashion  we  can  deduce  inequalities  (2.3)  and  (2.4)  of  Olkin  (1992)  as  a  consequence  of 
Theorem  2.7  if  we  choose  b  =  b2,  and  b  =  b3  respectively,  where  the  jth  component  of  the  vectors 
bo  and  b3  are  given  by 


f  (l/*)-(l/n)  ifl<j<* 

1  —1/n  iffc  +  l<j<n 


(3.5) 
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b3] 


l/k  ifl<j<k 
<  — 1/r  ifk+l<j<k  +  r 
0  if  k  +  r  <  j  <n. 


(3.6) 


Also  the  inequality  in  Olkin  (1992)  involving  Gini  mean  difference,  due  to  Nair  (1956),  follows  from 
Theorem  2.7  if  we  choose  b  =  b4,  where  the  jth  component  of  b4  is  given  by 


2(2j  -  n  -  1) 
n(n  — 


for  1  <  j  <  n. 


(3.7) 


Let  us  choose  the  vector  b  =  bs  in  Theorem  2.7 


,  where  the  jth  component  of  bs  is  given  by 


hj 


-1  if  j  =  1 

1  if  j  ss  n 
0  otherwise 


(3.8) 


Clearly  b5  €  M{B)  and  b^B+b5  =  b^b5  =  2.  Thus  from  Theorem  2.7  we  have  2B  -  bsb'g  is 
positive  semidefinite.  For  a  vector  y'  =  (yi,...,yn),  let  y'  =  (y(i),...,y(n))  where  y(<)  ’s  are  the 
ordered  values  of  the  components  of  y.  Since  2B  -  b5  b'5  is  positive  semidefinite  we  have 


2y'By  > y'b5bsy 


(3.9) 


which  after  simplification  reduces  to  an  inequality,  due  to  Thompson  (1955),  given  by 

W 

(y(n)-y(i))2  <  2  £(y,  -y)2.  (3.10) 

«=i 


We  will  now  show  that  the  multidimensional  inequalities  contained  in  Olkin  (1992)  can  also  be 
deduced  from  our  theorems.  Let  W  be  a  matrix  of  order  /  x  n  such  that  We  =  0  and  W  W'  =  I/. 
Let  B  =  In  -  £ee'  be  as  before  and  let  A  =  W' W.  Since  B+  =  B  and  Ae  =  0  we  have 
M(A)  C  M( B)  and  B+  A  =  A.  Hence  the  largest  eigenvalue  of  B+  A  equals  the  largest  eigenvalue 
of  A  and  which  in  turn  equals  the  largest  eigenvalue  ofWW'.  Therefore  Ai  =  1  for  this  choice  of 
B  and  A.  Therefore  by  Theorem  2.4,  we  have  B  —  A  is  positive  semidefinite.  Thus  we  get 

y'W'Wy  <  y' (In  -  iee')y  for  all  y  £  Rn .  (3.11) 


Thus  for  any  m  x  n  matrix  Z,  the  matrix 

Z(ln  —  ^ee')Z'  — ZW'WZ'  (3.12) 

is  positive  semidefinite.  Hence  inequality  (3.6)  in  Olkin  (1992)  holds. 

Application  3.2  Our  second  application  deals  with  multiple  comparison  procedures  in  linear  mod¬ 
els.  One  of  the  most  important  problems  in  multiple  comparisons  is  the  problem  of  construction  of 
simultaneous  confidence  intervals  for  a  given  set  of  estimable  functions.  Among  the  several  methods 
available,  Scheffe’s  technique  has  been  the  most  popular  and  widely  used  method  for  the  construction 
of  simultaneous  confidence  intervals.  A  very  nice  description  of  Scheffe’s  5-method  can  be  found  in 
Seber  (1977),  page  128.  In  most  texts  the  S-method  is  usually  described  assuming  that  the  design 
matrix  is  of  full  rank  and  the  set  of  given  estimable  functions  are  linearly  independent.  However, 
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this  is  rarely  the  case  in  practice.  As  an  important  application  of  the  results  of  Section  2  we  now 
show  that  Scheffe’s  5-method  can  be  extended  to  the  case  where  the  design  matrix  is  not  of  full 
rank  and  the  set  of  estimable  functions  are  linearly  dependent. 

Consider  the  linear  model  y  =  X0  +  e,  where  y  is  an  n  x  1  vector  of  observations,  0  is  a  p  x  1 
vector  of  parameters,  X  is  a  design  matrix  of  order  nxp  and  e  is  a  n  x  1  vector  of  random  errors.  Let 
us  assume  that  e  is  distributed  as  multivariate  normal  with  mean  0  and  variance-covariance  matrix 
<r2I„.  Assume  that  the  rank  of  X  is  r,  where  r  <  p.  Consider  s  estimable  functions  K'0,  where 
K  xt  is  a  matrix  of  rank  q  <  s.  It  is  well  known  that  the  condition  of  estimability  is  equivalent 
to  M(K)  C  Ad(X'X).  Let  G  be  a  g-inverse  of  X'X  and  (n  -  r)  «t2  =  y'  (L.  -  X  G  X')  y  .  From 
Theorem  4.6  of  Seber  (1977)  it  follows  that  the  statistic 

p  _  (K'i  -  K'/?)'  (K'  G  K)-  (K'/?  -  K '/?)/<?  (3  13) 

a2 

has  an  F-distribution  with  q  and  (n  -r)  degrees  of  freedom,  where  0  is  any  solution  to  the  equation 
X'X/?  =  X'y.  Let  F“n_r  be  the  100(1  —  a)  percentile  of  the  F-distribution  with  q  and  (n  -  r) 
degrees  of  freedom,  then  from  (3.13)  we  have 

1  —  a  =  Pr  (F  <  F“„.P) 

=  Pr  [(K'0  -  K '/?)'  (K'GK)~  (K'0  -  K'0)  <  q  a2  F£n.r) 

=  Pr(  b'B-b  <  jj)  (3-14) 

where  77  =  ?cr'2  F?%.r)  B  =  K'  GK  and  b  =  K'(0-0).  Note  that  b  e  M( B)  since  b  6  M(K')  and 
from  Lemma  3.3  below  we  have  M(K')  =  M(K'  G  K).  Since  B  is  a  symmetric  positive  semidefinite 
matrix  and  b  6  M(B),  by  Corollary  2.8  we  have  that  (3.14)  is  equivalent  to 

1  — a  =  Pr(  h'bb'h  <  i)h'Bh  forallh  ) 

=  Pr  (  Ih^K'/?  -  K'0)\  <  y/rjWBh  forallh).  (3.15) 

We  therefore  have  a  simultaneous  confidence  intervals  for  any  linear  function  h'(K'/?)  of  the  estimable 
functions  K'0,  namely, 

h'(K'/?)  ±  (?  F“n.r)x/2  d-  \/h'(K'GK)h  (3.16) 

such  that  the  overall  probability  for  the  whole  class  of  such  intervals  is  equal  to  (1  -  a). 

We  have  used  the  following  lemma  in  Application  3.2. 

Lemma  3.3  Let  K  and  X  be  as  in  Application  3.2.  Suppose  that  A4(K)  C  ^(X'X).  Let  G  be  a 
g-inverse  of  X'X.  Then  M(K')  =  M(K'GK). 

Proof.  Clearly  Ad(K'  GK)  C  Ad(K').  Let  G  be  a  g-inverse  of  X'X.  Since  -M(K)  C  Ad  (X'X)  we 
can  write  K  =  (X'  X)  D  for  some  matrix  D.  Therefore  the  rank  of  K'  G  K  is  same  as  the  rank  of 
D'(X'X)D,  which  in  turn  equals  the  rank  of  D'X'.  Thus  we  have 

rank  of  (K'  G  K)  =  rank  of  (D'X') 
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>  rank  of  (K').  (3.17) 

Since  the  other  inequality  always  holds,  we  have  rank  of  (K'  G  K)  equals  rank  of  K'.  This  completes 
the  proof  of  the  lemma. 
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Abstract 

In  a  recent  paper,  Chaganty  (1997,  J.  Statist  Plann.  Inference  63,  39-54)  introduced  the 
method  of  quasi-least  squares  (QLS)  for  estimating  the  regression,  correlation  and  scale  pa¬ 
rameters  in  longitudinal  data  analysis  problems.  The  QLS  estimates  of  the  regression  and  scale 
parameters  are  consistent  even  if  the  working  correlation  structure  is  misspecified.  The  esti¬ 
mate  of  the  correlation  parameter,  however,  is  asymptotically  biased.  In  this  paper,  we  present 
modified  (C-QLS)  estimates  of  the  correlation  parameter  for  the  following  working  correlation 
structures  that  are  appropriate  for  the  analysis  of  balanced  and  equally  spaced  longitudinal  data: 
the  unstructured  matrix,  for  which  the  C-QLS  estimate  is  a  positive  definite,  consistent  correlation 
matrix;  and  the  exchangeable,  tridiagonal,  and  autoregressive  structures,  for  which  the  C-QLS 
estimates  are  feasible,  consistent  and  robust  against  misspecification.  We  also  present  feasible 
and  consistent  C-QLS  estimates  for  two  structures  appropriate  for  the  analysis  of  unbalanced 
and  unequally  spaced  longitudinal  data:  the  Markov  and  generalized  Markov  working  correlation 
structures  that  were  discussed  by  Nunez-Anton  and  Woodworth  (1994,  Biometrics  50,  445-456) 
and  Shults  and  Chaganty  (1998,  Biometrics  54,  1622-1630).  We  then  present  an  improved 
consistent  estimate  of  the  scale  parameter.  Finally,  examples  are  given  to  contrast  the  C-QLS 
estimates  with  estimates  obtained  using  the  widely  used  generalized  estimating  equation  (GEE) 
approach.  ©  1999  Elsevier  Science  B.V.  All  rights  reserved. 
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1,  Introduction 


We  consider  longitudinal  data  that  can  be  described  as  follows.  Let  T,  =  (yn , . . . ,  ym  Y 
be  a  vector  of  repeated  measurements  taken  on  the  it h  subject;  associated  with  each 
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measurement  is  a  vector  of  covariates  x,j  —  (x,y i , . . .  yxgp )';  1  < n,  1  ^  ^  m.  The 
Yfs  are  uncorrelated  with  an  unspecified  distribution  that  satisfies  E( yij )  =  pij  and 
var (ytj)  =  (f>v(llij)'  The  variance  function  v  is  assumed  to  be  known  but  </>> 0  may 
be  a  known  constant  or  an  unknown  scale  parameter.  We  also  assume  that  there  is 
an  invertible  function  g ,  known  as  the  link  function,  such  that  /<//  =  Pij(P)~  0~{(x'jP)y 
where  /?  €  3tp  is  a  vector  of  unknown  regression  coefficients.  The  correlation  between 
the  repeated  measurements  on  each  subject  is  modelled  by  a  working  correlation  matrix 
R(x),  which  is  a  function  of  the  vector  a  =  (ai,...,a9)'.  The  set  of  feasible  values  of 
a  is  a  subset  of  such  that  R(z)  is  a  positive-definite  matrix  for  a €<9*.  The 
vector  a  is  considered  to  be  an  unknown  nuisance  parameter  that  must  be  eliminated 
to  estimate  /?,  the  main  parameter  of  interest. 

There  are  numerous  papers  in  the  literature  concerning  estimation  of  j 3  and  a  that 
use  the  method  of  generalized  estimating  equations  (GEE),  introduced  by  Liang  and 
Zeger  (1986).  In  this  paper,  we  primarily  focus  on  an  alternative  method  of  estimation 
known  as  quasi-least  squares  (QLS)  that  was  described  in  Chaganty  (1997)  and  Shults 
and  Chaganty  (1998).  Both  the  GEE  and  the  QLS  methods  use  the  same  estimating 
equation  for  /?  and  both  methods  yield  a  consistent  estimate  for  /?.  Furthermore,  as 
tti  — *  co,  the  asymptotic  relative  efficiency  of  the  QLS  estimate  of  with  respect  to 
the  corresponding  GEE  estimate  is  1.  The  two  methods  differ  however,  in  estimation 
of  the  working  correlation  parameter  a.  The  QLS  method  estimates  a  by  minimizing 
the  generalized  error  sum  of  squares  (see  Eq.  (2.1)),  whereas  the  GEE  method  uses 
a  moment  estimate  for  the  correlation  parameter  a. 

The  QLS  estimate  of  the  correlation  parameter  a  is  asymptotically  biased.  In  this  pa¬ 
per,  we  eliminate  this  asymptotic  bias  by  modifying  the  QLS  estimate  of  the  correlation 
parameter  using  continuous  and  one-to-one  transformations  that  depend  on  the  working 
correlation  matrix.  Our  goal  in  eliminating  the  bias  of  the  QLS  correlation  parameter 
estimate  is  to  allow  for  consistent  estimation  of  the  standard  errors  of  the  regression 
parameter  estimate.  Consider  first  the  situation  where  we  have  an  equal  number  of 
measurements  on  each  subject  that  are  observed  at  equally  spaced  time  points.  Sup¬ 
pose  that  the  working  correlation  is  totally  unspecified.  In  this  case  the  modified  QLS 
(C-QLS)  estimate  of  the  correlation  matrix  is  positive  definite  and  consistent.  Next, 
suppose  that  a  structured  correlation  matrix  can  be  used  to  describe  the  pattern  of  cor¬ 
relation  among  observations  collected  on  each  subject  and  that  reasonable  candidates 
for  the  correlation  structure  include  the  AR(1),  equicorrelated,  and  the  tridiagonal.  Each 
of  these  structures  depends  on  a  single  parameter  p,  which  represents  the  correlation 
between  two  adjacent  observations  collected  on  a  subject.  (We  shall  use  a  to  denote 
the  working  correlation  parameter  and  p  to  denote  the  true  correlation  parameter.)  In 
this  situation,  we  recommend  using  a  two-stage  procedure  to  estimate  the  regression 
and  correlation  parameters.  The  first  stage  uses  the  AR(1)  structure  as  the  working 
structure;  it  yields  a  C-QLS  estimate  pm  of  the  correlation  parameter  p  that  is  not 
only  feasible  and  consistent,  but  is  also  robust  against  misspecification  of  the  corre¬ 
lation  structure  among  the  AR(  1 ),  equicorrelated,  and  tridiagonal  working  correlation 
structures.  In  the  second  stage  we  obtain  the  final  C-QLS  estimate  of  the  regression 
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parameter  by  solving  the  estimating  equation  for  P  using  pm  and  the  most  appropriate 
working  structure  for  the  data  being  analyzed.  This  will  ensure  that  we  do  not  lose 
any  efficiency  in  estimating  the  regression  parameter. 

Consider  now  the  situation  in  which  the  observations  on  each  subject  are  unbalanced 
and  unequally  spaced  and  the  correlation  between  measurements  on  each  subject  de¬ 
pends  on  their  separation  in  time.  The  intra-subject  correlation  may  also  stabilize  over 
time,  in  the  sense  that  two  successive  measurements  taken  on  a  subject  later  during 
a  study  will  be  more  highly  correlated  than  if  they  are  collected  earlier.  Two  correla¬ 
tion  models  that  are  appropriate  in  these  situations  are  the  Markov  and  the  generalized 
Markov  (see  Nunez- Anton  and  Woodworth,  1994).  As  was  discussed  in  Shults  and 
Chaganty  (1998),  the  method  of  GEE  may  yield  infeasible  estimates  of  the  correlation 
parameter  for  the  Markov  structure  and  cannot  easily  be  applied  for  the  generalized 
Markov  structure.  In  this  paper,  we  derive  a  consistent  C-QLS  estimate  of  the  correla¬ 
tion  parameter  for  the  Markov  and  generalized  Markov  structures  under  an  assumption 
that  the  correlation  model  has  been  correctly  specified.  The  C-QLS  estimate  of  the 
regression  parameter  p  is  obtained  by  solving  the  estimating  equation  for  p  using  the 
C-QLS  estimate  of  the  correlation  parameter  p. 

The  organization  of  this  paper  is  as  follows:  In  Section  2  we  briefly  describe  the 
method  of  QLS  and  give  an  expression  for  the  asymptotic  bias  of  the  correlation 
parameter  estimate.  In  Section  3  we  obtain  consistent  estimates  of  the  true  correla¬ 
tion  parameter  using  continuous  and  one-to-one  transformations  on  the  QLS  estimate 
of  the  working  correlation  parameter  for  several  correlation  models:  the  unstructured 
(Section  3.1 );  the  AR(1 ),  tridiagonal,  and  equicorrelated  (Section  3.2);  and  the  Markov 
(Section  3.3)  and  generalized  Markov  (Section  3.4).  In  Section  4  we  propose  an  im¬ 
proved  consistent  estimate  of  the  scale  parameter  <f>.  We  then  apply  GEE  and  the 
C-QLS  approach  with  the  modified  regression,  correlation  and  scale  parameter  esti¬ 
mates  in  analyses  of  equally  spaced  and  balanced  dental  data  (Section  5.1)  and  un¬ 
equally  spaced  and  unbalanced  audiology  data  (Section  5.2).  Finally,  the  appendix 
contains  the  proof  of  consistency  of  the  scale  parameter  estimate. 


2.  Quasi-least  squares 

Here,  we  briefly  describe  the  method  of  quasi-least  squares  when  the  data  comprise 
equal  numbers  of  measurements  (balanced  observations)  that  are  collected  at  equally 
spaced  time  points  on  each  of  a  group  of  independent  subjects.  For  a  more  detailed 
description  see  Chaganty  (1997)  and  Shults  and  Chaganty  (1998). 

Let  Zi(P)  =  A;'/2mYi  -  m(P))>  where  /!,(/?)  =  (/rf i(^), . . . , and  A,(p)  = 
diag(v(/in(/Q), ...,  v( /(,,,(/?)))  be  the  vector  of  means  and  diagonal  matrix  of  variances, 
respectively;  1^/^/w.  The  method  of  QLS  obtains  estimates  by  partially  minimizing 
the  generalized  error  sum  of  squares 

Qip,  *(*))= 

i=  1 


(2.1) 
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with  respect  to  /}€&p  and  agy.  Note  that  the  quadratic  form  Eq.  (2.1)  not  only 
depends  on  P  and  x,  but  also  on  the  structure  of  the  correlation  matrix  /?(x).  The 
estimating  equations  obtained  by  taking  partial  derivatives  of  Eq.  (2.1)  with  respect  to 
'  p  and  x,  are 

m  ... 

E  d'(P)a;1/2(P)  r-‘(X)  z,(p)= o  (2.2) 

i=l 


and 


fz'(P)^^Zi(P)  =  0,  (2.3) 

1=1  °*j 

where  Di(P)  =  dfii/dp\  The  QLS  estimates  and  im  of  P  and  a  are  the  solutions  of 
the  two  Eqs.  (2.2)  and  (2.3).  Let  R  be  the  true  unknown  correlation  structure  between 
the  repeated  measurements  on  each  subject.  Under  some  conditions,  appealing  to  the 
weak  law  of  large  numbers,  Chaganty  (1997)  has  shown  that 

Pm->P  and  xm->x-  J^l(x)a(x)  (2.4) 

in  probability  as  m  — >  oo,  where 

«■>-  K^r*)L  and  ^ 

(The  -  sign  in  Eq.  (2.4)  was  incorrectly  written  as  4-  in  Chaganty  (1997)).  It  follows 
from  Eq.  (2.4)  that  jim  is  a  consistent  estimate  of  /?,  even  if  the  working  correlation 
is  misspecified.  But  has  an  asymptotic  bias  given  by  J^l(ot)a( a),  which  depends 
both  on  the  working  and  the  true  correlation  matrices. 


3.  Consistent  estimate  of  the  true  correlation  parameter 

Here,  we  obtain  continuous  and  one-to-one  transformations  of  the  QLS  estimate  of 
the  working  correlation  parameter  to  obtain  a  consistent  estimate  of  the  true  correlation 
parameter.  These  transformations  depend  on  the  working  correlation  structure  that  is 
most  appropriate  for  our  data.  We  first  consider  longitudinal  data  that  are  balanced  and 
equally  spaced  in  time.  We  derive  transformations  in  a  closed  form  for  the  unstruc¬ 
tured  correlation  matrix  (Section  3.1)  and  for  the  AR(1),  equicorrelated,  and  tridiag¬ 
onal  structures  (Section  3.2).  We  next  derive  bias-eliminating  transformations  for  two 
structures  appropriate  for  the  analysis  of  unbalanced  and  unequally  spaced  data  -  the 
Markov  (Section  3.3)  and  the  generalized  Markov  (Section  3.4).  These  transformations 
are  not  in  a  closed  form,  but  the  C-QLS  estimates  of  the  correlation  parameters  can 
be  obtained  numerically. 
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3.  L  Unstructured  correlation  matrix 

Suppose  that  the  working  correlation  matrix,  R{ot)  =  R  is  unstructured.  Here  the  fea¬ 
sible  set  Sfu  is  the  class  of  all  positive-definite  correlation  matrices.  Let  jSwm  and  Rm  be 
the  QLS  estimates  of  and  R ,  respectively.  Since  the  QLS  estimate  Rm  is  asymptoti¬ 
cally  biased,  we  will  use  a  transformation  on  Rm  to  obtain  a  consistent  estimate.  From 
the  results  of  Whittle  (1958)  and  Olkin  and  Pratt  (1958)  we  know  that  every  positive 
definite  matrix  I  admits  a  unique  decomposition 

I  =  RAR,  (3.1) 

where  A  is  a  diagonal  matrix  of  positive  elements  and  R  is  a  correlation  matrix.  Clearly, 
decomposition  (3.1 )  also  holds  for  the  subclass  of  positive  definite  correlation  matrices. 
The  function  fu  \R  {—RAR)-*R  is  a  continuous,  one-to-one  and  onto  mapping  from 
to  where  =  {R  €  £fu'  v  =  (RoR)~le>0}.  Here,  e  is  a  vector  of  ones  and 
o  denotes  the  Hadamard  product.  Furthermore  fu((j)R)  =  f(R)  for  all  (j)> 0.  It  is  easy 
to  verify  that  for  Re&u,  f~l(R)  =  RAR  =  R,  where  /i  =  diag(u).  For  n  =  2  we  have 
9>u  —  ff>u  and  for  n> 2,  the  set  ^  is  a  proper  open  subset  of  u .  See  Olkin  and  Pratt 
(1958)  (p.  233)  for  an  example  of  a  correlation  matrix  that  is  in  Sfu  but  not  in 
If  the  working  correlation  is  unstructured,  we  can  obtain  a  bias  corrected  estimate 
Rcm,  of  the  true  correlation  matrix  using  the  following  three  steps: 

Step  1:  Assume  that  the  working  correlation  matrix  is  unstructured  and  compute  the 
QLS  estimates  Pum  and  Rm.  See  Chaganty  (1997)  (Example  4.4)  for  computational 
details. 

Step  2:  Compute  Zum=(l/rn)J2'!LiZi(Pum)Zj(,Pumy  and  vm=(RmoRm)~'e,  where 
o  denotes  the  Hadamard  product. 

Step  3:  Obtain  the  modified  estimate  of  the  correlation  matrix, 

(Rum  =  fu-'(L)  =  Rmdiag(vm)Rm  if  v„>0  (i.e.  Rme^u), 

Rcm  “  \ 

{  Rsm  =(diag(Z„m)rl/2Zum(diag(Z„m))“1/2  otherwise. 

(3.2) 

We  will  now  establish  that  Rcm  is  a  consistent  estimate  of  R.  From  Example  4.4  in 
Chaganty  (1997)  we  know  that  Zum  can  be  written  as 

Zum=RmAmRm,  (3.3) 

where  Am  is  a  diagonal  matrix  of  positive  elements.  Since  Zum  — ►  (f)R  almost  surely, 
as  m->oo,  and  decompositions  (3.1)  and  (3.3)  are  unique,  it  is  easy  to  see  that 
fu(<t)R)  =  fu(R)  =  R,  where  R=  lim^^  Rm.  Also  for  sufficiently  large  m,  Rme&u, 
since  &u  is  an  open  set  and  Re  S^u.  Therefore, 


RUn,=fu-l(Rm)^fu-'(R)=R 


(3.4) 


150  N.  R.  Chaganty,  J.  Shultsl  Journal  of  Statistical  Planning  and  Inference  76  (1999)  145-161 


in  probability  as  m— ►  oo.  Clearly  Rsm  is  a  consistent  estimate  of  R.  Therefore,  the 
modified  estimate  Rcm  is  also  a  consistent  estimate  of  the  true  correlation  matrix  R. 

Remark  3.1.  From  the  above  discussion  it  is  clear  that  Rcm  =  f~l(Rm)  (i.e.  Rm  €  5^) 
almost  surely  for  sufficiently  large  values  of  m.  In  some  longitudinal  data  analysis 
problems  it  is  possible  that  Rmg5Zu.  We  should  view  this  outcome  as  an  indication  that 
our  assumption,  var (yg)  =  <pv(fiij)  may  be  incorrect  and  that  the  correct  specification 
might  instead  be  var(y/y)  =  (pjv(pij).  In  either  case,  Rcm=Rsm  is  a  consistent  estimate 
of  R. 

Remark  3.2.  Note  that  the  unstructured  working  correlation  can  also  be  used  to  an¬ 
alyze  longitudinal  outcomes  that  are  not  equally  spaced;  however,  the  timings  of  the 
measurements  should  be  the  same  for  all  the  subjects. 


3.2.  Structured  correlation  matrix:  Balanced  and  equally  spaced  data 

When  analyzing  balanced  and  equally  spaced  data,  use  of  a  structured  correlation 
matrix  is  often  preferable  to  the  use  of  an  unstructured  matrix.  One  important  advantage 
afforded  by  fitting  a  structured  matrix  is  that  it  will  allow  for  parsimonious  modelling 
of  the  regression  and  correlation  parameters. 

The  bias  correcting  technique  that  was  used  in  Section  3.1  for  unstructured  correlation 
can  be  stated  more  formally  in  the  following  theorem  for  structured  correlation  matrices. 


Theorem  3.2.  Let  p,  a,  p  and  <f>  be  fixed.  Let  R(ot)  be  the  working  correlation  struc¬ 
ture ,  and  assume  that  the  true  correlation  R{p)  is  also  structured \  where  a  and  p  are 
vectors  in  which  is  a  subset  of  .  Suppose  that  the  solution  of  the  equation 


*»•'•’=  K^ML, =<> 


(3.5) 


is  given  by  a  =  f(p),  that  is,  b{f(p),p)~  0  or  equivalently  b(x,  f~l(ot))  =  0,  where 
f  is  a  continuous  and  one-to-one  function.  If  im  is  the  QLS  estimate  of  a  then  the 
C-QLS  estimate  pm  —  f~x(im)  is  a  consistent  estimate  of  p. 


Proof.  Let  pm  and  am  be  the  QLS  estimates  of  ft  and  a,  respectively,  when  the  working 
correlation  structure  is  R(ol).  Note  that  am  is  the  solution  of  the  equation 

K^r*-)L, =0-  <3'6) 

where  Zm  =  ( \/m)  ^  JZffi J .  Since  pm  ->  p,  we  have  Zm  -+  fiR(p)  and  there¬ 
fore,  im— *  f(p)  in  probability,  as  m— mx>.  Hence,  pm=  f~i(£m)-+p  in  probability  as 
m— xx>.  This  completes  the  proof  of  the  theorem.  □ 
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Remark  3.3.  An  alternative  view  of  our  bias  correcting  technique  can  be  summa¬ 
rized  as  follows.  Note  that  the  expected  value  of  Zm=(\/m)Y^=\  Zi(ft)  Zff)1  equals 
<j)R{p).  If  we  modify  the  estimating  Eq.  (2.3)  so  as  to  make  it  unbiased  we  obtain  the 
equation 

<57) 

To  get  a  consistent  estimate  of  the  true  correlation  parameter  p ,  we  could  directly 
solve  the  unbiased  estimating  Eq.  (3.7)  replacing  /?  with  that  is,  replacing  Zm 
with  Zm.  But  there  are  two  drawbacks  to  this  approach:  direct  solution  of  Eq.  (3.7) 
requires  estimation  of  (f) ;  and,  for  some  common  working  correlation  structures,  even 
if  the  working  correlation  is  correctly  specified,  a  feasible  solution  may  not  exist.  Our 
method  of  estimation  overcomes  these  drawbacks.  Note  that  the  estimates  /?m,  im  and 
Pm  satisfy 


'  ( g/r'(a) 

A 


=  0 


J  qx  l 


and 


K 


8R~'(x) 


d%i 


R(Pm) 


=  0. 


qx  1 


Therefore,  we  have 

'(*> 


'  (dR-y 

V  ^ 


(Zm  -  <j>R{pm)) 


=  0  V<£>  0. 


J  qx  1 


(3.8) 


(3.9) 


(3.10) 


Thus,  the  C-QLS  method  gives  a  solution  to  the  unbiased  estimating  Eq.  (3.7)  that 
does  not  depend  on  0.  Moreover,  the  estimate  pm  of  the  true  correlation  parameter 
exists,  is  feasible,  unique,  and  easy  to  compute.  The  standard  conditions  required  to 
establish  consistency  are  satisfied  so  that  pm  is  indeed  consistent. 


Remark  3.4.  Theorem  3.2  requires  the  specification  of  the  working  and  the  true  un¬ 
derlying  correlation  structure  of  our  data.  It  will  therefore  be  useful  when  the  working 
correlation  is  correctly  specified,  or  when  the  function  f(p)  in  Theorem  3.2  does  not 
depend  on  the  structure  of  the  true  correlation  matrix  R.  The  latter  situation  occurs  in 
the  analysis  of  balanced  and  equally  spaced  data  if  we  choose  an  appropriate  work¬ 
ing  correlation  structure.  For  unbalanced  and  unequally  spaced  correlated  data  we  will 
assume  that  the  former  is  true,  that  is,  that  we  have  correctly  specified  the  working 
correlation  matrix.  However,  the  correlation  model  that  we  consider  in  this  paper  is 
extremely  flexible  and  thus  is  appropriate  for  data  that  have  a  wide  range  of  charac¬ 
teristics  that  are  typical  for  unbalanced  and  unequally  spaced  longitudinal  data. 
Because  of  this,  and  in  the  absence  of  appropriate  alternative  correlation  structures,  the 
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assumption  that  we  have  correctly  specified  the  working  correlation  model  will  be  rea¬ 
sonable  in  most  analyses  of  unbalanced  and  unequally  spaced  longitudinal  outcomes. 

In  many  practical  situations  when  the  longitudinal  data  are  balanced  and  are  observed 
at  equally  spaced  time  intervals,  the  correlation  parameter  a  is  a  real  variable  and  the 
useful  and  popular  working  correlation  structures  are:  (i)  identity  (ii)  equicor- 

related  (/?(2)(a));  here  all  the  off-diagonal  elements  equal  a,  (iii)  AR(l)  (/?(3)(a));  the 
(ij)  element  for  this  structure  is  given  by  and  (iv)  tridiagonal  (/?(4)(a));  the  ele¬ 
ments  just  above  and  below  the  diagonal  equal  a  and  the  others  are  zero.  We  observe 
that  if  the  working  correlation  matrix  R(ot)  is  AR(1)  then  the  solution  of  Eq.  (3.5)  is 


r  i -\/n^7 


*  =  fa(p)  =  { 


if  p  ±  o, 
if  P  =  0, 


(3.11) 


when  the  true  correlation  structure  R(p)  =  R<-'\  j  =  1,2, 3,4.  We  exploit  this  important 
observation  to  obtain  a  consistent  and  robust  estimate  of  the  true  correlation  parame¬ 
ter  p.  Note  that  for  AR(  1 )  working  correlation  structure,  the  set  SPa  of  feasible  values 
of  x  is  the  open  interval  (-1,1)  and  the  function  fa(p)  is  a  continuous,  one-to-one 
and  onto  mapping  from  Sfa  to  £Ta.  The  inverse  mapping  is 


/«"'(«)  =  />  = 


2x 

1  +  x2' 


(3.12) 


Let  fiam  and  xam  be  the  QLS  estimates  of  ft  and  x,  respectively,  when  the  working 
correlation  has  AR(1)  structure.  Then  it  follows  from  Theorem  3.2  that 


Pam  =  fa  '(Xam)^P  (3.13) 

in  probability  as  m— >00,  when  R(p)  —  RS^  for  j  =  2,3,4.  We  can  easily  check  that  if 
ot  =  0  then  as  well  as  pam ,  both  converge  to  0  in  probability  as  w^oo,  when  the 
true  correlation  matrix  is  R^l\  the  identity  matrix.  Therefore,  the  estimate  pam  of  the 
true  correlation  parameter  p  is  consistent,  feasible,  and  robust  against  misspecification 
among  the  four  most  widely  used  correlation  models  for  analyzing  balanced  and  equally 
spaced  longitudinal  data. 


3.3.  Unbalanced  and  unequally  spaced  data:  Markov  structure 

Suppose  that  the  longitudinal  data  are  unbalanced  and  that  measurements  are  made 
on  subject  i  at  times  0</,i  </,2<  **•  <'/»,;  l  ^i^m.  A  suitable  working  correlation 
for  these  unequally  spaced  repeated  measurements  is  the  generalized  Markov  structure 
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given  by 


/ 


w*)= 


i  ,f2  >r- 

tf “  i  ne,i 


„«,2+e,3 


1 


^»2+er3'+ - *~einj  \ 

yjeii+ei4l - ^ein, 

- 1 -ej„. 


n 


Zi2  +e<l  "+ - —  1  > 


V 


yfi  2 +C/3 +  ■••+«»,. 


e,-j+crt+— +e^ 


•  l/e,rt'  1  / 

(3.14) 


where  a  =  (aj  =  >/,  z2  =  A)'  and  the  e,*’s  are  functions  of  the  parameter  /  defined  as 
follows: 


f  t4 


i  log(te)-  log(r,-(*-i)) 


if 

if  ;.=o, 


(3.15) 


for  2 1  The  feasible  range  for  the  parameter  2  is  (-00,00)  and  rj  is 

restricted  to  (0, 1).  We  will  discuss  the  bias  correction  for  this  structure  in  Section  3.4, 
but  first  we  consider  the  Markov  correlation  model,  the  important  special  case  of  the 
generalized  Markov  structure  when  A  =  1 .  The  Markov  structure  is  appropriate  when  the 
correlation  between  unequally  spaced  measurements  collected  on  a  subject  decreases 
with  increasing  separation  in  time.  Here,  e,*  =  [//*  — //(*-i)];  2<£</i,;  1  The 

correlation  parameter  a  =  >/  is  a  real  variable  and  is  restricted  to  the  interval  (0, 1 ). 
Let  us  assume  that  the  working  correlation  is  the  Markov  structure  and  it  is  correctly 
specified,  that  is,  the  true  correlation  structure  /?,  is  also  given  by  Eq.  (3.14)  and  /  =  1. 
The  bias  correcting  equation  in  this  case  is 


b(*,p)=±tr(^^Ri(p)') 

_  A  A  2ettg2ea~l  -  pg* e, •*[***" 1  +«3e,*~‘] 

"■ hh  [i-*2-]2 

=  0.  (3.16) 


Note  that  Eq.  (3.16)  reduces  to  Eq.  (3.12)  when  the  data  are  balanced  and  equally 
spaced,  that  is,  «/  =  n  and  e,*  =  1  for  all  /  and  k .  Let  am  be  the  QLS  estimate.  The 
bias  corrected  estimate  is  then  obtained  by  solving  the  equation 


b(am,p)  =  0 


(3.17) 
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for  p.  That  is,  given  y.m  the  bias  corrected  estimate  pm  satisfies  the  equation 


Af  2  e,k  i2*'k 

h  ti  [1  -  4e'*]2 


m  tii 

=  EE 


i=l  k=2 


Pm 


[1  ~im‘kY 


(3.18) 


where  eik  =  [tik  -  //(*_r)];  2 ^k^n,;  That  there  is  a  unique  solution  pm  for 

Eq.  (3.18)  that  lies  in  the  interval  (0,  1)  can  be  shown  as  follows.  Let  us  denote  the 
l.h.s.  of  Eq.  (3.18)  by  c.  Let 


m  /(, 

Hp)=  E  E 


i—  1  k=2 


pCik 


[i  -  **  *]2 


(3.19) 


Clearly,  the  function  h(p)  is  a  continuous  and  increasing  function  of  />,  since  im  € 
(0, 1 )  and  the  elk’s  are  all  positive.  Also  h( 0)  =  0  and  we  can  easily  verify  that  h(  1  )>c. 
By  the  mean  value  theorem  we  conclude  that  there  exists  a  unique  pm  e  (0, 1 )  such  that 
h(pm)  =  c.  The  estimate  pm  can  be  computed  numerically  using  the  bisection  method. 


3.4.  Unbalanced  and  unequally  spaced  data:  Generalized  Markov  structure 

The  Markov  structure  is  widely  used  when  the  intra-subject  correlation  of  measure¬ 
ments  decreases  with  increasing  separation  in  time.  However,  this  structure  may  force 
the  intra-subject  correlations  to  decrease  too  rapidly  with  increasing  separation  in  time 
and  it  does  not  take  into  account  the  actual  timings  of  measurements  in  the  study. 
When  the  primary  outcome  of  interest  stabilizes  over  time  within  subjects,  two  succes¬ 
sive  measurements  collected  on  a  subject  later  during  the  study  will  be  more  highly 
correlated  than  if  they  were  collected  earlier,  so  that  the  correlation  between  these 
measurements  will  depend  on  their  time  of  occurence  in  the  study.  The  generalized 
Markov  structure  generalizes  the  Markov  model  so  that  the  decrease  in  intra-subject 
correlations  may  be  dampened  (or  accelerated)  with  increasing  separation  in  time.  It 
also  allows  for  stabilization  in  the  outcome  variable  over  time;  see  Nunez-Anton  and 
Woodworth  (1994)  and  Shults  and  Chaganty  (1998)  for  a  detailed  discussion  of  these 
structures. 

Suppose  that  the  true  and  working  correlation  structures  both  are  generalized  Markov. 
Let  a  =  (>/,  /)  and  p  =  (tj,X).  For  convenience  of  notation  we  will  supress  the  argument 
k  and  write  e u  for  e,*(A)  and  eik  =eik(k  ).  Since  we  have  unequal  number  of  observa¬ 
tions  on  each  subject  the  appropriate  bias  correcting  equations  analogous  to  Eq.  (3.5) 
are  the  following  two  equations: 

ft.(a>p)=Etr(^^^(p))=0  (3.20) 

and 

^.p)=Etr(^^))=0. 


(3.21) 
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Let  im  -  ( Y\m ,  km )  be  the  QLS  estimate  of  y.  =  (g,  /).  The  C-QLS  estimate  pljm  =  ) 

is  obtained  by  solving  for  p  simultaneously  the  equations  b\($m,p)  =  Q  and  b2(im^P)  = 
0,  which  after  simplification  reduce  to  the  following  two  equations: 


A  A  -  >fk{\  +  iff* )] 

/=1*=2 


(3.22) 


AAC &  [2e-^‘d+^‘)] 


EE 

/=!  *=2 


(3.23) 


,  dm('og(^))  -  '/(V I  )(log( /,(*-!)))  ft-tfa-p 


where  eik  =  eik{km )  and  — -  = - - - y2 - . 

0'»m  An  *m 

We  used  the  MATLAB  Optimization  Toolbox  routine  ‘constr’  to  obtain  the  QLS 
estimate  im  =  (gm,  km)  and  the  MATLAB  routine  ‘fsolve’  to  solve  Eqs.  (3.22)  and 
(3.23)  to  obtain  C-QLS  estimate  p  =  (ijm,km)  in  the  example  discussed  in  Section  5.2. 


4.  Consistent  estimate  of  the  scale  parameter 

Now,  suppose  that  the  scale  parameter  <f>  is  unknown.  In  this  section  we  will  obtain 
a  consistent  estimate  of  (f>  for  working  correlation  structures  that  are  appropriate  for 
balanced  and  equally  spaced  observations  and  also  for  unbalanced  and  unequally  spaced 
longitudinal  data. 

4. 1.  Balanced  and  equally  spaced  data 

Suppose  that  the  longitudinal  data  are  balanced  and  are  observed  at  equally  spaced 
time  points.  In  Theorems  4.1  and  4.2,  we  obtain  a  consistent  estimate  of  <f>  when  the 
working  correlation  is  unstructured  and  structured,  respectively. 

Theorem  4.1.  Let  and  <j>  be  fixed.  Let  Q(P,R(%))  be  as  defined  in  Eq.  (2.1 ).  Let  R  be 
the  true  unknown  correlation  matrix .  Assume  that  the  working  correlation  structure 
is  totally  unspecified ,  that  is,  R(ot)~R.  Let  Rm)  be  the  solution  of  the  Eqs.  (2.2) 
and  (2.3).  Let  Rcm  be  as  defined  in  Eq.  (3.2).  Assume  that  the  conditions  of  Theorem 
5.1  in  Chaganty  (1997)  hold.  Then 

QiAmJtcm)  .  (4.1) 

mn 

in  probability  as  m  — >  oo. 

Theorem  4.2.  Let  ft,  a  and  0  be  fixed.  Let  Q(fi,R( a))  be  as  defined  in  Eq.  (2.1).  As¬ 
sume  that  the  working  correlation  structure  is  j4.R(1),  that  is,  R(x)  =  /?(3)(a).  Let 
(Pam^am)  be  the  solution  of  the  Eqs.  (2.2)  and  (2.3).  Let  pam  be  as  defined  in 
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Eq.  (3.13).  Assume  that  the  conditions  of  Theorem  5.1  in  Chacjcmty  (1997)  hold. 
Then 

)  ^  (4.2) 

mn 

in  probability  as  m  — ►  oo,  when  the  true  correlation  structure  R,  is  any  one  of  the 
following :  (i)  equicorr elated,  (ii)  AR(  1 )  and  (iii)  tridiagonal  Further ,  if  p  =  0  than 
Eq.  (4.2)  also  holds ,  when  R  equals  the  identity  matrix . 


The  proof  of  Theorem  4.2  is  given  in  the  appendix,  Theorem  4. 1  is  proved  similarly. 
From  the  above  theorems  we  can  see  that  a  consistent  estimate  of  (j)  is 

7  f  Qifinr  Rcm)fmn  if  R(x)  =  R,  (4  3) 

~  l  Q(L,  #3Hpam))/mn  if  R(*)  =  R^(*). 

Since  the  QLS  estimates  partially  minimize  the  quadratic  form  (2.1),  in  practice  for 
small  samples,  the  estimate  </>c  will  be  smaller  than  the  popular  consistent  estimate  of 
<j)  given  by 


if  QiPunr  !)/mn  if  R(*)  =  R,  (44) 

<Pp  \  &L’l)/mn  if  /?(*)  =  *<3>(x), 

where  /  is  the  identity  matrix.  Therefore,  a  good  consistent  estimate  of  $  is  <f>g  = 
min(0c,  (j>p),  since  it  yields  shorter  width  confidence  intervals  for  linear  functions  of  j? 
than  <$>p  or  </>c. 


4.2.  Unbalanced  and  unequally  spaced  data 


Suppose  that  we  have  observations  measured  on  subject  i  and  that  these  observa¬ 
tions  may  be  unequally  spaced  in  time.  Assume  that  the  working  correlation  structure  is 
correctly  specified  and  that  it  is  Markov  (generalized  Markov).  Let  P(jm  and  pgm  be  the 
C-QLS  estimates  of  the  regression  parameter  and  the  true  correlation  parameter  p , 
when  the  working  correlation  is  Markov  (generalized  Markov).  Let 


7  i  ^z!(Pgm)Zi(Pgm) 
=  -  L - - - 

m  1  Uj 


(4.5) 


and 


^  _  1 


m  ,= 


i=i 


(4.6) 


where  the  correlation  matrix  R ,  is  given  in  Eq.  (3.14).  A  good  consistent  estimate  of 
<t>  is  given  by  <£y  =  min (<j>c,  (j>p). 
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5.  Examples 

To  contrast  the  QLS  modified  regression,  correlation  and  scale  parameter  estimates 
described  in  Sections  3  and  4  with  the  corresponding  GEE  estimates,  in  this  section 
we  present  the  results  of  two  analyses.  The  first  is  of  balanced,  equally  spaced  data, 
for  which  the  AR(l)  and  the  unstructured  are  reasonable  candidates  for  a  working 
correlation  model.  The  second  is  of  unbalanced,  unequally  spaced  data,  for  which  the 
Markov  and  generalized  Markov  structures  are  appropriate. 


5.7.  Analysis  of  balanced  and  equally  spaced  data 

Here  we  analyze  the  longitudinal  data  displayed  in  Table  1  of  Potthoff  and  Roy 
(1964).  The  data  were  collected  in  a  dental  study  of  27  subjects  ( 1 1  girls  and  16  boys). 
They  comprise  measurements  (yf  s),  in  millimeters,  from  the  center  of  each  subjects 
pituitary  to  pteryomaxillary  fissure  recorded  at  8,  10,  12  and  14  yr  of  age.  Jennrich 
and  Schluchter  (1986)  analyzed  these  data  using  maximum  likelihood  procedures  to 
illustrate  the  use  of  different  covariance  structures  to  model  repeated  measurements. 
We  fit  the  following  regression  model  (Model  2  in  Jennrich  and  Schluchter  (1986)): 

flij  =  PgXj\  +  pbxi2  +  7gxi\  *  Xfi  +  JbXi2  *  */3>  1  ^7  ^4,  1  ^27,  (5.1) 

where  Xj\,  xi2  are  indicator  variables  for  the  two  sexes,  girl  and  boy,  respectively.  The 
covariate  x ;3  is  the  subject’s  age  at  the  yth  measurement  time.  It  takes  the  values  8, 
10,  12  and  14. 

Table  i  contains  estimates  and  standard  errors  for  the  regression  parameters  and  the 
estimate  of  the  scale  parameter,  computed  using  the  GEE  and  C-QLS  methods.  The 
estimates  were  computed  using  both  the  AR(1)  and  the  unstructured  (UNSTR)  working 
correlation  matrices.  The  GEE  estimates  were  obtained  using  PROC  GENMOD  in  SAS 
version  6.12.  The  standard  errors  of  the  regression  parameters  were  computed  using 
the  model-robust,  sandwich-type  estimator;  see  (5.10)  in  Chaganty  (1997). 


Table  1 

Regression  analysis  of  a  dental  study  data  using  GEE  and  C-QLS  methods  with  AR(  1 )  and  unstructured 
working  correlation  matrices 


Parameter 

GEE 

C-QLS 

AR(1 ) 

UNSTR 

AR(1 ) 

UNSTR 

Est. 

Std. 

Est. 

Std. 

Est. 

Std. 

Est. 

Std. 

Pg 

17.3213 

0.7780 

17.3973 

0.7244 

17.3215 

0.7776 

17.4018 

0.6972 

Pb 

}'g 

16.5946 

1.2788 

16.3236 

1.1701 

16.5931 

1.2781 

16.0523 

1.1288 

0.4838 

0.0629 

0.4781 

0.0639 

0.4837 

0.0629 

0.4770 

0.0632 

lib 

<P 

0.7965 

4.9107 

0.1050 

0.7881 

4.9058 

0.0983 

0.7695 

4.9106 

0.1049 

0.8122 

4.9076 

0.0939 
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Table  2 

Estimates  of  the  working  correlation  matrices.  C-QLS  (GEE)  estimates  are  above  (below)  the  diagonal 


AR(1 ) 

UNSTR 

— 

0.6099 

0.3720 

0.2269 

— 

0.5284 

0.6609 

0.5084 

0.6135 

— 

0.6099 

0.3720 

0.5010 

— 

0.5551 

0.7034 

0.3764 

0.6135 

— 

0.6099 

0.7363 

0.5553 

0.7269 

0.2309 

0.3764 

0.6135 

__ 

0.5149 

0.6208 

0.7788 

— 

Table  3 


Regression  analysis  of  audiology  data  using  C-QLS 


fa  (SE) 

fa  (SE) 

CORR 

Pi  (SE) 

MARK 

16.99  (3.35) 

2.31  (0.32) 

—0.05  (0.01) 

GMARK 

18.67  (3.60) 

1.92  (0.31) 

-0.03  (0.01) 

Estimates  of  the  working  correlation  matrices  are  displayed  in  Table  2.  It  is  clear 
from  Tables  1  and  2  that  the  estimates  obtained  by  GEE  and  C-QLS  are  in  reasonable 
agreement,  but  with  one  exception.  The  standard  errors  of  the  regression  parameter 
estimates  obtained  by  C-QLS  are  smaller  than,  and  hence  preferable  to,  those  obtained 
by  GEE. 

5.2.  Analysis  of  unbalanced  and  unequally  spaced  data 

In  this  section  we  apply  the  Markov  and  generalized  Markov  structures  using  the 
C-QLS  approach  to  estimation  of  the  parameters.  As  discussed  in  Shults  and  Chaganty 
(1998),  the  GEE  method  often  yields  infeasible  correlation  parameter  estimates  for 
the  Markov  structure.  It  is  also  difficult  to  apply  GEE  using  the  generalized  Markov 
model  because  moment  estimates  for  its  parameters  are  not  easy  to  obtain.  Both  these 
correlation  structures  were  not  implemented  in  the  SAS,  version  6.12,  GEE  procedure 
PROC  GENMOD. 

The  data  we  examine  (see  Table  3  of  Nunez- Anton  and  Woodworth,  1994)  were 
also  analyzed  by  Shults  and  Chaganty  (1998).  They  comprise  measurements  collected 
during  a  study  to  compare  two  cochlear  prostheses  implanted  in  a  group  of  postlin- 
gually  deafened  adults.  The  study  outcome  is  the  percentage  of  sentences  recognized 
on  a  sentence  recognition  test  that  was  administered  at  1,9,  18,  and  30  months  post 
implantation.  Because  not  all  subjects  completed  all  four  sentence  recognition  tests,  the 
data  are  unbalanced  and  unequally  spaced  in  time. 

Our  final  regression  model  for  the  marginal  mean  of  the  outcome  variable  (/(,,) 
agrees  with  the  final  model  fit  by  Nunez- Anton  and  Woodworth  (1994) 

Hij  =  fio  H-  P\tn  4*  Pita,  (5.2) 

where  tj\  is  the  month  of  measurement  and  /,2  =  tfi  •  Table  3  contains  estimates  and 
standard  errors  for  the  regression  parameters  that  were  computed  using  C-QLS  and  the 
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Table  4 


C-QLS  estimates  of  the  correlation  between  successive  measurements 


CORR 

1  and  9  mo. 

9  and  18  mo. 

18  and  30  mo. 

MARK 

0.9168 

0.9069 

0.8778 

GMARK 

0.8442 

0.9433 

0.9564 

Markov  (MARK)  and  generalized  Markov  (GMARK)  working  correlation  models.  The 
robust  standard  errors  were  obtained  using  (5.10)  in  Chaganty  (1997).  As  can  be  seen 
in  Table  3,  fitting  the  generalized  Markov  structure  yields  an  estimated  constant  that  is 
slightly  greater  in  value  and  coefficients  associated  with  the  timings  of  measurements 
that  are  slightly  smaller  in  value  than  the  corresponding  coefficients  obtained  by  fitting 
the  Markov  model. 

Most  interesting  however,  is  the  difference  between  the  correlation  estimates  that  are 
obtained  by  fitting  the  generalized  Markov  as  opposed  to  the  simpler  Markov  model. 
The  generalized  Markov  model  allows  us  to  model  the  intra-subject  correlations  in 
a  manner  consistent  with  the  findings  of  Gantz  et  al.  (1988),  who  observed  that  there 
is  a  ‘definite  learning  curve  involved  with  the  use  of  cochlear  implants’.  This  implies 
that  the  correlation  between  two  measurements  will  be  greater  if  the  measurements  are 
collected  on  a  subject  later  during  the  study,  rather  than  earlier.  Table  4  contains  the  es¬ 
timated  correlation  between  two  successive  measurements  on  a  subject  for  the  Markov 
(MARK)  and  the  generalized  Markov  (GMARK)  correlation  structures.  (These  esti¬ 
mates  are  based  on  pg  =  0.9892  for  the  Markov  structure  and  (^,a)  =  (0.9305,  0.0612) 
for  the  generalized  Markov  structure.)  As  is  shown  in  Table  4,  the  generalized  Markov 
structure  yields  what  we  expect  for  the  audiology  data-  increasing  intra-subject  corre¬ 
lations  over  time,  whereas  the  Markov  structure  forces  a  decrease  in  the  correlation 
between  the  successive  measures  on  each  subject. 

While  modelling  intra-subject  correlation  is  not  our  primary  goal,  fitting  the  corre¬ 
lation  structure  that  is  most  reasonable  for  our  data  analysis  situation  should  yield  the 
best  results  in  terms  of  analysis  of  our  main  parameter  of  interest.  In  any  case,  it  would 
be  contrary  to  the  tradition  of  statistical  modelling  to  assume  that  fitting  the  model  that 
does  not  best  approximate  reality  would  yield  optimum  results  for  our  data  analysis 
problem.  The  ability  to  fit  the  generalized  Markov  structure,  the  structure  appropriate 
when  the  outcome  variable  stabilizes  over  time,  is  thus  an  important  advantage  afforded 
by  the  C-QLS  approach  over  GEE. 
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Appendix 


Proof  of  Theorem  4.2.  Let  0  =  (/?,  p)\  %  and  be  fixed.  Assume  that  the  working  corre¬ 
lation  structure  R(ot)  is  AR(  1 ).  Suppose  that  9a  =  iam  Y  is  the  solution  of  Eqs.  (2.2) 
and  (2.3).  Let  0C  =  (fianr  pam)',  where  pam  is  defined  in  Eq.  (3.13).  Assume  that 
the  conditions  of  Theorem  5.1  in  Chaganty  (1997)  hold.  Let  v,(0)  =  ZKP)R~](p)Zj(P). 
Since  Vv,(0fl)  =  O,  using  Taylor  series  expansions  we  can  write 


E  v,(0c)  -  E  M6) = ( ec  -  ey  e  Vv,(0)  +  (ec  -  ey-  e  v2v,(0*  m  -  e ) 

/=1  /=!  i'=l  Z  1=1 


=  (oc-ey  Ev2v,rxo-i) 

/=! 


a  1  m 

+  (0c-0/rEV2v,(0*)(0c-0), 
J-  i=  l 


(A.l ) 


where  0*,  0 **  are  points  on  the  line  joining  6C  and  0,  and  the  line  between  0a  and  0, 
respectively.  From  Eq.  (A.l)  we  get 

1  r  m  m  A  "I  A  1  rn  A 

-  E v,(0) - E MOc)  =  (8c -ey- E v2v,(0**)(0a - 8) 
m  Ui  i=i  J  m  1=1 

1  m 

-(0c  -  ey—  e  v2v,(0*  x®c  -  0)-  (a.2) 

2/m  Jtt 

Since  (§e  -  6)-* 0  and  (0a  -  0),  (l/mJET.,  V2v,(0*),  (1/«i)ET-i  V2v,(0**)  are 
bounded  in  probability,  from  Eq.  (A.2)  we  get 

1  f  m  m  A 

—  Evi(0)-Ev/(0c)  — ^0  (A.3) 

m  L=i  i=i 

in  probability  as  m  — >  oo.  By  the  weak  law  of  large  numbers  we  also  have 

—  Evi(0)— tr(/?-,(p)/?),  (A.4) 

m  i=\ 

in  probability  as  m— ►oo.  It  is  easy  to  verify  that  tr (R~~](p)R)  =  n  when  R  is  any  one 
of  the  following  structures:  (i)  equicorrelated,  (ii)  AR(1),  and  (iii)  tridiagonal.  We 
can  now  see  that  Eq.  (4.2)  follows  from  Eqs.  (A.3)  and  (A.4).  When  p  =  0  and  the 
true  correlation  is  the  identity  matrix,  we  can  verify  that  (0C  —  0)— >0  in  probabil¬ 
ity  as  m— > oo.  Thus  (A.3)  also  holds  in  this  case.  This  completes  the  proof  of  the 
theorem.  □ 
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1.  Preliminaries.  In  this  paper  we  will  need  the  following  matrix  version  of  the  Cauchy- 
Schwartz  inequality. 

Theorem  1.  Let  Bi,  D,-,  1  <  »  <  m,  be  matrices  of  order  t  x  p.  Let  1  <  »  <  m  be 
positive  definite  matrices  of  order  t.  Then 

(£#*)  a) 

is  nonnegative  definite,  assuming  that  the  inverses  in  (1)  exist. 

2.  Optimal  Estimating  function.  The  longitudinal  data  analysis  problem  considered 
by  Liang  Sc  Zeger  (1986)  can  be  described  briefly  as  follows:  Let  {YJ,  1  <  *  <  m}  be  inde¬ 
pendent  vectors  such  that  E(Yj)  =  m{P)  and  covariance  matrix 

where  H  is  the  true  correlation  between  the  components  of  the  vector  Y}  assumed  to  be 
the  same  for  all  1  <  »  <  m.  The  mean  vector  mifi)  and  the  diagonal  variance  matrix 
Ai(j3)  are  assumed  to  be  known  functions  of  0rxi-  The  problem  is  to  estimate  the  un- 
known  regression  parameter  Following  the  ideas  contained  in  Godambe  (1960)  and 
Godambe  Sc  Kale  (1991),  let  us  consider  the  class  of  unbiased  estimating  equations 

G={EB<(*-^))  =  0}  (2) 

where  Bi,  1  <  s  <  m,  are  t  x  p  matrices.  Let  A(y9)  =  6 midp  be  the  matrix  of  partial 
derivatives  of  order  t  xp.  For  each  estimating  function  g  6  Q,  let 

*-(?>*)■  (e^5<)  •  (*) 
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WArtJnv**  (3l  *  the  covariance  matrix  of  the  standardized  version  a  - 
(  (dg/d/3))  g,  of  the  estimating  function  g  (see  Godambe  k  Kale  (1991),  p  14)’  Note 

that  when  £,  =  S'1  Du  the  matrix  M,  =  M,.,  where  If,.  =  W  i-?  n  t-x  ° 

Th^rem  1  we  have  that  Mg  -  Mr  is  nonnegative  define  fiSfl T Wo  7 

r? is^(y  Tm  s—** ^ for * “ the dass 

h if‘  noif'r  ^)}'  ST.m  practice  fche  £“*  Nation  A  is  unknown  Liang 

^  JS\E±£,  srff »  rr5  °f 

H  ?  losf  m  effiaency  to  misspedfication  is  given  by  I,(ff  ~ 

estimated  byr^Wg'j  «  iL  Offi*  “d'£*(fc)  C“  be 

»™toon  vetoes  to  to*  th.  tocto  to  yidd,  th,  toto  toe  t ^ 
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1.  Summary 

The  growth  curve  model  of  PotthofF  and  Roy  (1964)  has  been  studied  extensively  by 
numerous  authors  via  the  maximum  likelihood  method,  under  the  assumption  of  normality 
for  the  outcome  variable.  In  this  paper  we  apply  the  method  of  quasi-least  squares  developed 
in  Chaganty  (1997)  and  Chaganty  and  Shults  (1999)  for  estimating  and  testing  a  hypothesis 
concerning  the  parameters  in  the  growth  curve  model  when  the  normality  assumption  is  not 
satisfied.  Large  sample  properties  of  the  estimates  and  the  test  statistic  are  also  presented. 

2.  Quasi-least  squares  estimates 

The  growth  curve  model  is  defined  as  Ypxn  =  BpXfn  Amxr  Arxn  +  Epxn  where  Y  is 
the  response  matrix  consisting  of  p  repeated  measurements  taking  on  n  individuals,  A  is  an 
unknown  parameter  matrix,  A  and  B  are  known  within-subject  and  between-subject  design 
matrices  of  ranks  m  and  t  respectively.  We  assume  that  the  error  matrix  E  has  zero  mean  and 
cov  ( vec (E))  =  a2 1„  (g>  R(a),  where  vec  (E)  is  the  pnx  1  column  vector  formed  by  stacking  the 
columns  of  the  matrix  E.  Here  In  is  the  identity  matrix  and  R(a)  is  a  correlation  matrix  that 
is  a  function  of  the  unknown  parameter  a.  Several  authors  have  studied  this  model  under  the 
assumption  that  E  has  a  matrix  normal  distribution.  We  do  not  make  any  such  assumption 
in  this  paper.  The  method  of  quasi-least  squares  developed  in  Chaganty  (1997)  and  Chaganty 
and  Shults  (1999)  is  based  on  minimizing  the  objective  function 


Q(A,  a)  =  tr  ((Y  -  B  A  A)'R_1(a)  (Y  -  BAA))  .  (1.1) 

Equating  to  zero  the  partial  derivatives  of  (1.1)  with  respect  to  A  and  a,  we  get 


A  =  (B'r^alBJ-'B'r^aJYA^AAt1  (1.2) 

and 

tr(^*’I(“)u(A))=0  (13) 

where  U(A)  =  (Y  -  BAA)  (Y  -  BAA)'.  Let  (A,  5)  be  the  solution  of  the  equations 
(1.2)  and  (1.3).  We  can  show  that  5  is  asymptotically  biased  as  n  — >  oo,  since  the  estimating 
equation  (1.3)  is  not  unbiased.  However,  the  solution  a  of  the  estimating  equation 


is  a  consistent  estimate  of  a.  We  shall  call  3  the  quasi-least  squares  estimate  of  a.  If  the 
correlation  matrix  R(a)  has  the  AR(1)  structure,  we  can  verify  that  3  =  25/(1  +  52).  The 
quasi-least  squares  estimates  of  A  and  a2  are  given  by 

A  =  (B'  iT1  B)-1  B'  R'1  Y  A'  (A  A')-1  and  a2  =  tr  (R_1U))/pn  (1.5) 

where  U  =  U(A)  and  R  =  R(a).  The  large  sample  properties  of  the  quasi-least  squares 
estimates  are  established  in  the  following  theorem: 

Theorem  1.  Fix  A,  a  and  a2.  Let  A,  5  and  a2  be  the  quasi-least  squares  estimates  of  A,  a 
and  a2  respectively.  Assume  that  the  matrix  W  =  A' (A  A')-1//2  =  {wlj)  is  such  that  max,.,  wfj 
converges  to  zero  as  n  — >  oo.  Then  as  n  -4  oo, 

(a)  vec  ((A  —  A)  (A  A')1/2)  converges  in  distribution  to  a  multivariate  normal  distribution 

with  mean  zero  and  covariance  matrix  u2  Ir(§)(B,  /?  ^(ck  )  B)-1. 

(b)  3  -4  a  and  a2  converges  to  a2  in  probability. 

3.  Test  of  Hypothesis 

We  now  consider  the  problem  of  testing  Ho  :  D  A  E  =  N  vs  Ha  :  D  A  E  ^  N ,  where 
Ddxm  and  Erxe  are  known  matrices  of  ranks  d  and  e  respectively.  In  most  situations  the  matrix 
N  is  the  null  matrix.  Let  A  be  the  estimate  obtained  minimizing  Q( A,  a)  with  respect  to  A, 
subject  to  the  restriction  DAE  =  N.  It  is  easy  to  verify  that 

A  =  A  +  (B'r1B)'1D'AT(AA')-1  (1.6) 

where  A  =  (D  (B 'FT1  B)-1  D')~x  ( N  -  DA  E)  ( E '  (A  A')-1  E)~l.  We  propose  the  test  statistic 

T  =  tr  ((B(B'  R-1  B)"1  B')_1  B  (A  -  A)A  A' (A  -  A)'  B')  /d2.  (1.7) 

Large  values  of  T  are  considered  to  be  significant  and  the  theorem  below  is  useful  for  determining 
the  critical  values. 

Theorem  2.  Assume  that  the  conditions  of  Theorem  1  hold.  Then  as  n  — *  oo,  under  the  null 
hypothesis  H0  :  D  AE  =  N,  the  distribution  ofT  converges  to  a  central  y2  withpr  degrees  of 
freedom. 

To  test  the  hypothesis  JT0,  we  could  also  use  other  multivariate  tests  based  on  the  eigenvalues 
of  the  matrix  in  (1.7)  instead  of  the  trace. 

4.  REFERENCES 

Chaganty,  N.  R.  (1997).  An  alternative  approach  to  the  analysis  of  longitudinal  data  via 
generalized  estimating  equations.  J.  Statist.  Plann.  Inference  63,  39-54. 

Chaganty,  N.  R.  and  Shults,  J.  (1999).  On  eliminating  the  asymptotic  bias  in  the  quasi-least 
squares  estimate  of  the  correlation  parameter.  J.  Statist.  Plann.  Inference  76,  145-161. 

Potthoff,  R.  F.  and  Roy,  S.  N.  (1964).  A  generalized  multivariate  analysis  of  variance  model 
useful  especially  for  growth  curve  problems.  Biometrika  51,  313-326. 


Analysis  of  Multivariate  Longitudinal  Data  Using  Quasi-Least  Squares 


By 


N.  Rao  Chaganty  and  Dayanand  N.  Naik 
Department  of  Mathematics  and  Statistics 
Old  Dominion  University 
Norfolk,  VA  23529. 


Abstract 


In  this  paper  we  consider  the  analysis  of  multivariate  longitudinal  data  assuming  a  scale  multip  e 
of  Kronecker  product  correlation  structure  for  the  covariance  matrix  of  the  observations  on  each 
individual.  The  method  used  for  the  estimation  of  the  parameters  is  the  quasi-  east  s1u“es 
method  introduced  by  Chaganty  (1997,  J.  Statist.  Plann.  Inference  63 ,39-54),  and ^further 
developed  by  Shults  and  Chaganty  (1998,  Biometrics  54,  1622-1630)  and  Chaganty  and  Shu  ts 
(1999  J.  Statist.  Plann.  Inference  76,  145-161).  We  show  that  the  estimating  equations  for 
the  correlation  parameters  in  the  quasi-least  squares  method  are  optimal  unbiased  estima  mg 
equations  if  the  data  is  from  a  normal  population.  An  algorithm  for  computing  the  estimates 
is  provided  and  implemented  on  two  real  life  data  sets.  The  asymptotic  joint  distribution 
the  estimators  of  the  regression  and  correlation  parameters  is  derived  and  used  for  testing 
linear  hypothesis  on  the  regression  parameters. 
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1  Introduction 


In  many  practical  situations,  observations  on  n  experimental  units  (or  subjects)  are  made  on  a 
set  of  p  response  variables  (or  characteristics)  at  t  occasions.  Thus  on  each  experimental  unit 
we  have  a  pxt  matrix  of  observations.  These  data  are  termed  as  multivariate  repeated  measures 
data,  or  multivariate  longitudinal  data,  or  multi-response  growth  curve  data.  Few  examples  of 
the  experiments  that  yield  multivariate  longitudinal  data  are  given  below. 

•  In  an  experiment  to  study  the  effect  of  iron  with  vitamin  C  supplement,  n  subjects 
may  be  classified  into  one  of  the  three  groups:  Group  1  receiving  15  mg  elemental  iron 
with  25  mg  vitamin  C  three  times  a  day,  Group  2  receiving  15  mg  iron  with  50  mg 
vitamin  C  three  times  a  day  and  Group  3  receiving  simply  15  mg  iron  three  times  a  day. 
Measurements,  using  the  blood  sample,  may  be  collected  on  the  variables:  serum  iron, 
ferritin,  transferrin  saturation,  hemoglobin,  hematocrit,  and  total  iron  binding  capacity 
(TIBC).  Observations  on  these  six  variables  (p  =  6)  may  be  made  at  each  of  the  three 
time  periods  (t  =  3). 

•  In  an  experiment  where  a  new  drug  for  AIDS  is  being  tested,  on  each  of  the  n  subjects 
data  on  three  variables  (p  =  3)  (TMHR  scores,  Karofsky  scores,  and  T-4  cell  counts)  at 
three  time  periods  ( t  =  3)  during  the  study  (at  the  beginning,  after  90  days  of  treatment, 
and  after  180  days  of  treatment)  are  collected.  These  data  will  be  analyzed  in  Section  5 
of  this  paper. 

•  An  experiment  in  dental  study  concerns  with  the  relative  effectiveness  of  two  orthopedic 
adjustments  of  the  mandible.  Nine  subjects  are  assigned  to  each  of  the  two  orthopedic 
treatment  groups  known  as  activator  treatments.  The  measurements  are  made  on  three 
characteristics  (p  =  3)  to  assess  the  changes  in  the  vertical  position  of  the  mandible  at 
three  time  points  (t  =  3)  of  activator  treatment.  We  will  also  analyze  these  data  in 
Section  5. 

Suppose  we  have  n  subjects  (possibly  randomly  assigned  to  g  groups)  on  which  the  measure¬ 
ments  are  made  on  p  response  variables  at  t  occasions.  Let  be  the  observation  on  the  jth 
response  variable  taken  at  the  fcth  time  period  or  occasion  corresponding  to  the  ith  individual. 
Here  l<i<n,  l<j<p,  l<k<t.  Also,  associated  with  each  of  the  n  subjects,  suppose  we 
have  measurements  Xijki  taken  on  q  covariates  (1  <  l  <  q).  The  covariates  could  be  categorical, 
and  they  may  or  may  not  change  with  time.  Let 

Xu  =  ('J'illf*  X{p\h  Xi\2li  •••?  ^ip2iy  ^iXtl ?  3'iptl') 

be  the  vector  of  observations  on  the  Zth  covariate  taken  on  the  zth  subject.  Let  Xi  = 
[xn  :  :  Xiq]  ptxq  be  the  matrix  of  measurements  taken  on  the  q  covariates  associated  with 

each  response  variable  at  the  t  occasions  on  the  ith  individual  and 
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be  the  matrix  of  measurements  taken  on  the  response  variables  on  the  ith  individual.  Suppose 
that  the  expected  value  and  the  covariance  matrix  of  y*  =  vec(Yi)  are  E(yi)  =  Pi{P)  = 
Xi/3  and  Cov(yi)  =  Q  respectively.  Analysis  of  these  data  is  complicated  by  the  existence 
of  correlation  among  the  measurements  on  p  different  variables  together  with  the  correlation 
among  measurements  taken  at  t  different  occasions.  The  form  of  the  covariates,  that  is,  whether 
they  are  subject  specific,  time  varying,  or  varying  with  the  response  variables  may  further 
complicate  the  analysis.  However,  assuming  that  the  data  on  each  individual  come  from  apt 
dimensional  multivariate  normal  distribution  with  dispersion  matrix  fi,  the  maximum  likelihood 
estimator  of  /3  can  be  obtained  as  a  function  of  f2  and  inference  can  be  performed  using  the 
standard  asymptotic  theory  of  maximum  likelihood  estimators.  Some  general  discussion  of  this, 
using  a  mixed  model  approach,  in  an  applied  point  of  view  can  be  found  in  Khattree  and  Naik 
(1999). 

If  the  data  do  not  come  from  a  multivariate  normal  distribution  or  if  the  response  vari¬ 
ables  on  which  data  are  collected  are  not  continuous  then  the  standard  methods  do  not  readily 
apply.  Recently,  Chaganty  (1997)  introduced  a  new  method  called  “quasi-least  squares”  for 
analyzing  longitudinal  data.  This  method  is  an  alternative  to  the  GEE  method  of  Liang  and 
Zeger  (1986)  and  its  various  variations.  Quasi-least  squares  method  was  developed  to  overcome 
some  of  the  pitfalls  of  the  GEE  method  (Crowder,  1995).  Unlike  the  GEE  method  which  can 
yield  non-feasible  and  inconsistent  estimates  for  the  correlation  parameters,  Chaganty  (1997) 
and  Chaganty  and  Shults  (1999)  have  shown  that  the  “quasi-least  squares  method”  always 
yields  feasible  and  consistent  estimates  for  the  correlation  parameters.  Quasi-least  squares 
method  has  been  successfully  utilized  in  various  practical  problems  involving  unbalanced  and 
unequally  spaced  data.  See  Shults  and  Chaganty  (1998)  and  Chaganty  and  Shults  (1999). 

It  has  been  observed  by  Boik  (1991)  and  Naik  and  Rao  (1997)  that  assuming  a  Kronecker 
product  structured  covariance,  that  is,  ®  U-2:  where  fii  and  O2  respectively  are  t  x  t 

and  p  x  p  positive  definite  matrices,  has  many  advantageous  in  analyzing  multivariate  repeated 
measures  data.  Further  the  linear  model  with  this  covariance  structure  reduces  to  the  well 
known  Zellner’s  Seemingly  Unrelated  Regression  (SUR)  model  when  U2  =  7.  Hence  in  this 
article  we  will  consider  Kronecker  product  covariance  structure  for  the  dispersion  matrix  of  t/j . 

The  main  focus  of  this  paper  is  to  implement  the  quasi-least  squares  method  for  analyz¬ 
ing  multivariate  longitudinal  data  assuming  a  scale  multiple  of  Kronecker  product  correlation 
structure  for  the  covariance  matrix.  The  organization  of  the  paper  is  as  follows.  In  Section  2, 


we  will  describe  the  quasi-least  squares  method  as  applied  to  the  present  situation.  We  also 
present  a  discussion  of  the  optimality  of  the  estimating  equations  and  an  iterative  algorithm 
for  the  computation  of  the  estimates.  In  Section  3,  we  will  derive  closed  form  solutions  for 
the  estimates  of  the  correlation  parameters  for  some  popular  correlation  structures.  In  Sec¬ 
tion  4,  we  will  derive  the  joint  asymptotic  distribution  of  the  quasi-least  squares  regression  and 
correlation  parameter  estimates.  We  also  present  a  test  statistic  for  testing  linear  hypothesis 
concerning  the  regression  parameter  (3  and  derive  its  asymptotic  distribution.  We  will  present 
the  analysis  of  two  data  sets  in  Section  5  and  finally  end  with  some  concluding  remarks. 


2  The  method  of  quasi-least  squares 

For  analyzing  multivariate  repeated  measures  data  that  are  continuous  non-normal  or  cate¬ 
gorical  we  adopt  the  quasi-least  squares  method,  described  in  Chaganty  (1997),  and  the  bias 
corrected  version  of  the  correlation  parameter  in  Chaganty  and  Shults  (1999).  To  put  the 
problem  in  a  slightly  general  frame  work  we  assume  that 

E(yi)  =  Hi(l3)=g(Xip),  (2.1) 

where  as  before  Xi  is  the  p  t  x  q  design  matrix,  (3  is  a  q  x  1  vector  of  unknown  parameters  and 
the  inverse  of  g  is  a  known  link  function.  Further  assume  that  the  covariance  matrix  of  j/j  is 


ft  =  <f>A}/2(/3)  (RT(a)  ®  f?p(7))  A],2((3)  =  ^(9)  (say)  (2.2) 

where  6  =  (/?,  a,  7)'  and  Rr(a)  and  i?p(7)  respectively  are  correlation  matrices  of  order 
t  x  t  and  px  p,  which  are  functions  of  the  vectors  a  and  7  respectively.  The  correlation  matrix 
RT(a )  represents  the  correlation  among  the  t  repeated  measurements  over  time,  whereas,  Rp{n/) 

1  fry 

represents  the  correlation  among  the  p  response  variables.  The  ptxpt  diagonal  matrix  Ai  (f3) 
contains  the  standard  deviations  and  $  is  an  overdispersion  or  a  scale  parameter.  The  mean- 
covariance  model  (2.1)-(2.2)  encompasses  several  discrete  and  continuous  models.  While  the 
main  parameter  of  interest  is  /?,  the  parameters  a,  7  and  <f>  are  nuisance  parameters. 

2.1  Estimating  equations 

Here  we  describe  the  method  of  quasi-least  squares.  This  is  a  two  stage  procedure.  In  the  first 
stage  we  minimize  with  respect  to  /?,  a  and  7  the  quadratic  form 
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(2.3) 


E  (Vi  ~  Hi m'A-1/2((3 )  (Rj\a)  ®  Rpl(l))  A~1/2{P)  (W  -  /*<(^)) 


=  E  zi(PY  (-Rt1^)  ®  Zz(/3) 

i=l 


where  Zi{(3)  =  with  (,4t1/2(/?))_1  =  A  1/2(/3)-  Note  that  *(/?)  =  vec(Zi((3)) 

where 


ZiW)  = 


zm 


zipl 


Zilt 


Zipt 


pxt 


Since 

trfc'MZ'iiflRpWZitf))  =  Vec(Zi(/3)),(i^1(a)®i?p1(7))vec(Zi(/3)) 

=  ^(Z3)  (-^T1  (tt)  ®Rpl(l/))  zi{P) 


we  can  rewrite  the  quadratic  form  (2.3)  as 

trOR^V)  E"=i  Z'(/3)  ^pX(7)  W))  =  np  tr^1^)^,  7))  (2-4) 

and  also  as 

tr(V(7)E”=i  ZiWRj'WZim  =  nttr(/ip1(7)Vr(/3>a))  (2.5) 

1  71 

where  the  matrix  t/(/3,  7)  =  — E  ^(Z3)  #p*(7)  Zi(/3)  is  of  the  order  t  x  t  and  V(/?,  a)  = 
1  " 

—  E  Zi(fi)  R^l(oi)  Z-((3)  is  of  order  p  x  p.  Equating  to  zero  the  partial  derivatives  of  (2.3), 
nt  i= 1 

(2.4)  and  (2.5)  with  respect  to  (3 ,  a  and  7  respectively,  we  obtain  the  following  three  estimating 
equations: 

f^DmA-y\p){B^\a)®R-p\1))zm  =  0  (2.6) 

i=l 

tr  m  7>)  -  0  (2-7) 

tIf^MV(Aa)^  =  0  (2.8) 

where  £)*(/?)  =  dm/d(3'.  Let  0  =  (/?,  5,  7)'  be  the  solution  of  the  above  three  equations.  The 
estimate  (3  is  consistent  but  the  estimates  a  and  7  are  asymptotically  biased  (see  Theorem  4.1). 
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The  main  reason  being  the  estimating  equation  (2.6)  is  unbiased  whereas  the  equations  (2.7) 
and  (2.8)  are  not  unbiased.  The  second  stage  in  the  quasi-least  squares  method  consists  of 
solving  the  two  equations 

_  Rr(<*)\  =  0  (2.9) 

a—Q  / 

and 


tr 


'  dK?(a) 

da 


to  obtain  consistent  estimates  a  and  7  of  a  and  7  respectively.  Let  0  =  0  (or  the  estimate 
obtained  solving  the  equation  (2.6)  substituting  2  and  7  for  a  and  7  respectively).  We  shall 
call  the  estimates  0,  a  and  7  as  the  quasi-least  squares  estimates  of  0,  a  and  7  respectively. 
Finally,  a  consistent  estimate  of  (f>  is  given  by 

<j>  =  min  ($1,  h)  (2-11) 


Rp( 7)  =  0 


7=7 


(2.10) 


where 


and 


&  = 


gU  (to  -  y  CMg)  ®  ^(t))-1  (vi  -  M) 

ntp 


<t> 2 


E"=l  (2/i  -  PiiPj)  (yi  ~  Vi0)) 

ntp 


2.2  Optimality  of  the  estimating  equations 

It  is  well  known  that,  when  a  and  7  are  known,  the  function 

*(/?,  a,  7)  =  j:DM)A-l/\p){R?{a)®R-p\')))zm  (2.12) 

1  =  1 

is  the  optimal  unbiased  estimating  function  for  estimating  0  according  to  Godambe’s  criterion 
(see  Godambe  (1960),  Heyde  (1997,  page  22)).  Since  E(U(0 ,  7))  =  (pRria ),  the  function 

92{0,  <*,  7,0)  =  ^  “  0fir(a))j 
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is  clearly  unbiased.  Also,  since 


dRrl{a)  _  dRT{a) 


—  —Rj  (a) 


RTl(a ),  using  the  properties  of 


da  1  v  }  da 

trace  and  Kronecker  product  (Rao  and  Rao,  1998,  page  202)  we  can  verify  that 


g2{P,  a,  7 ,4>)  =  -  )  ( Rt(c* )  ®  Rt{<*))  1  (vec(U{/3,  7))  -  <t>vec{RT{a ))) . 

(2.13) 

When  /?,  7  and  (f>  are  known,  the  unbiased  estimating  function  (2.13)  is  the  optimal  estimat¬ 
ing  function  for  estimating  a  if  a  constant  multiple  of  Cov  (vec(U (3,  7)))  is  used  in  place  of 
(Rr(a)  3  jRr(a))-  But  Co v{vec{U{(3,  7)))  depends  in  general  on  the  fourth  moments  of  the 
yj’s  and  we  have  no  assumptions  made  concerning  the  fourth  moments.  However,  note  that 
if  the  yi's  are  normal  then  Cov  ( vec(U(p ,  7)))  is  2  <p  {RT{a)  3  Rt{o))-  Thus  the  estimating 
function  (2.13)  is  the  optimal  unbiased  estimating  function  for  the  parameter  a  when  the  y;’ s 
are  normally  distributed  and  the  other  parameters  are  known.  And  it  will  be  close  to  the 
optimal  unbiased  estimating  equation  whenever  a  constant  multiple  of  Cov  {vec(U {(3,  7)))  is 
approximately  equal  to  (Rt{oi)  <3  Rt {<*))•  Similarly,  since  E(V (p,  a))  =  (pRp( 7),  the  function 


S3(fta,7,0  =  lr  <*)  - 

is  unbiased.  And  if  the  y,’ s  are  independent  and  normally  distributed  we  can  check  that 
Cov  (vec(V (P,  a)))  =  2<f>  (RP( 7)  3  Rp( 7))-  Therefore,  the  function 

g3(P,  a,  7,  4>)  =  -  (Rp(l)  ®  Rp{l))~l  ( vec{V(p ,  a))  -  (f>vec{RP( 7))) 

(2.14) 

is  the  optimal  unbiased  estimating  function  for  estimating  7  when  P ,  a  and  (f>  are  known, 
if  the  yi's  are  normally  distributed,  and  is  close  to  being  optimal  if  a  constant  multiple  of 
Cov  (vec(V(P,  a)))  is  approximately  equal  to  (Rp(-y)  3  Rp{ 7))-  Now  from  (2.7),  (2.8),  (2.9) 
and  (2.10)  we  can  see  that  the  quasi-least  squares  estimates  satisfy  gi(P,  a,  7)  =  0, 


and 


tr(^) 


(U(p,  7)-0i?T(S))j  =0 


(2.15) 


tr(a^ 

for  all  (f) .  In  particular  the  equations  (2.15)  and  (2.16)  are  satisfied  when  <j>  —  <f>.  Thus  the 
method  of  quasi-least  squares  provides  a  feasible  solution  to  the  unbiased  estimating  equations 


{ V{(3 ,  3)  -  (f)Rp{y))J  =  0, 


7))  |  = 


(2.16) 
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gi  =  0,  for  i  =  1, 2, 3.  If  the  data  is  from  a  normal  population  then  these  three  equations  are 
also  the  optimal  unbiased  estimating  equations  according  to  Godambe’s  criterion.  Regardless 
of  normality,  closeness  of  these  estimating  equations  to  optimality  corresponds  to  closeness  of 
(constant  multiples  of)  Cov  {vec(U{(3,  7)))  and  Cov  (vec(V{/3,  a)))  to  ( Rr{a )  <g>  i?r(a))  and 
(RP{ 7)  <g>  Rp{ 7)),  respectively. 


2.3  Algorithm 


In  general  a  closed  form  solution  does  not  exist  for  the  estimating  equations  (2.6),  (2.7)  and 
(2.8).  And  we  need  to  solve  those  equations  using  a  recursive  procedure  like  the  Newton- 
Raphson  method.  An  iterative  algorithm  for  obtaining  the  first  stage  quasi-least  squares  esti¬ 
mates  of  /3 ,  a  and  7  could  be  described  as  follows: 

Step  1:  Start  with  a  trial  value  /Jo- 

Step  2:  Fix  a  trial  value  for  70  and  compute  Uq  =  C/(A>  70)- 

Step  3:  Get  the  estimate  a0  minimizing  tr  (.^(a)  U0)  with  respect  to  a. 


Step  4: 
Step  5: 
Step  6: 
Step  7: 


Compute  Vo  =  V(/3o,  ao). 

Get  the  estimate  71  minimizing  tr(Rp1(7)  Vo)  with  respect  to  7. 

Repeat  Steps  2  through  5  with  70  =  71,  until  convergence  and  obtain  (70,  ao). 


Compute  the  updated  value 


A  =  A  + 


EA'  0  An1  Ao 
li= 1 


-l  r 


E  Ao  An1  Zi(Po) 

i=l 


where  Ei0  =  A(0O),  #0  =  (/?o,  ao,  7o)',  Ao  =  A  (A)  and  Di{(3)  =  dm{f3)/d(3' .  Stop  the 
iterative  procedure  if  A  ~  A  and  set  f3  =  A,  a  =  ao  and  7  =  7o-  Otherwise  repeat  Steps  2 
through  6  with  A  replaced  by  A- 


We  note  that  for  most  of  the  commonly  used  correlation  structures  the  second  stage  in  the 
quasi-least  squares  method  does  not  require  an  iterative  procedure,  since  the  estimates  can  be 
obtained  in  a  closed  form  as  shown  in  the  next  section. 
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3  Correlation  Structures 

Here  we  consider  several  popular  correlation  structures,  including  the  unstructured  correlation 
for  Rt((x)  and  illustrate  the  method  of  minimization  of  tr  (R^1(a)  Uq)  needed  in  Step  3  of  the 
algorithm.  We  will  also  obtain  feasible,  unique  and  often  closed  form  solution  to  the  equation 
(2.9)  for  these  correlation  structures.  The  structures  assumed  here  for  Rt{oi)  can  be  assumed 
for  Rp( 7)  as  well.  And  the  form  of  the  solutions  5  and  a  obtained  here  for  a  can  be  used  for 
obtaining  7  in  the  algorithm  described  in  Section  2.3  and  the  solution  7  for  the  equation  (2.10) 
in  Section  2.1. 

3.1  Equicorrelated  Correlation  Structure 

Suppose  that  the  t  repeated  measurements  are  equicorrelated,  that  is,  the  correlation  structure 
Rt(oc)  is  of  the  form  RT(a)  =  (1  -  a)  I  +  a  J,  where  I  is  the  identity  matrix  and  J  is  a  matrix 
of  ones.  Since 


we  have 


Rj\a)  = 


I  - 


a 


(1-a)  (1  -  a)(l  +  (t  -  l)a) 


=  <rho  tr(c/°)  ~  (1  _  „)(,  +  (f  _i)^) tr(J 

=  _ « _ _  (3  17') 

(1-a)  (1  -a)(l  +  (t-  l)a)’ 

where  a  =  tr(Uo),  b  =  tr(JUo)-  Taking  derivatives,  we  can  check  that  the  function  (3.17)  has 
a  unique  point  of  minimum  in  the  interval  (— 1  /(t  —  1),  1),  given  by 

„  — a(t  —  1)  +  y/b  ( t  —  1)  (at  —  6) 
a=  (t-l)(a(t-l)-b)  • 

Also  in  this  case  there  is  a  unique  solution  to  the  equation  (2.9)  in  the  interval  (— 1  /(t  —  1),  1) 
and  it  is  given  by 


a  =  Ki(a) 


52  (f  —  2)  +  2  S 
[1  +  a2  (t  —  1)] 


(3.18) 


We  can  verify  that  the  function  ki(-)  is  a  continuous,  one-to-one  and  onto  function  on  the 
interval  (— \/{t  —  1),  1). 
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3.2  First  Order  Autoregressive  (AR(1))  Correlation  Structure 

Consider  the  situation  where  the  correlation  between  the  t  repeated  measurements  decreases 
with  time.  A  commonly  used  correlation  structure  in  this  situation  is  the  AR(1)  structure, 
RT(a)  =  [a^l].  Here 

Rjl(a)  =  (i  -a2)[/  -  <*Ci  +  o2  cy, 

where  C\  is  a  tridiagonal  matrix  with  0  on  the  diagonal  and  1  on  the  lower  and  upper  diagonals 
and  C2  =  diag(0, 1, 1, 0).  Therefore 

ti(R^.1(a)  Uo)  =  [tr{U0)  -  atr{CiU0) +a2tT{C2U0)] 

=  7r  -I~2T  [a-  a  Cl  +  a2  c2],  (3.19) 

(1  -  a2) 

where  a  =  tr(i/o),  c\  =  tr(Ci  Uo),  and  c2  =  tr(C2  Uo).  We  can  easily  check  that  (3.19)  has  a 
unique  point  of  minimum  in  the  interval  (—1,  1)  given  by 


s=(a  +  c2)-0a  +  c2)2-j 
Cl 

In  this  case  the  feasible  solution  to  the  equation  (2.9)  is 

s='t‘(5)  =  (ni)-  (3-21) 

The  function  ki(-)  is  a  continuous  and  one-to-one  and  onto  function  on  the  interval  (—1,  1). 
We  will  use  the  above  estimate  S  in  the  examples  discussed  in  Section  5. 


3.3  Tri-Diagonal  Structure 

Let  Rt((x)  be  a  tri-diagonal  matrix,  that  is,  the  diagonal  elements  of  Rt{c *)  are  one  and  all 
the  elements  above  and  immediately  below  the  diagonal  are  equal  to  a  and  other  elements  are 
zero.  Here  R^.1(a)  does  not  have  a  closed  form  but  the  matrix  Rt(oi)  admits  a  spectral  value 
decomposition 


RT(a )  =  P  A  (a)  P' 

where  P,  the  matrix  of  orthogonal  eigen  vectors,  does  not  depend  on  a.  See  Chaganty  (1997), 
Example  4.2.  Now 

taiK^iaWo)  =  triA-1  (a)  P'UqP) 
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(3.22) 


Uk 


1  +  2 a  cos( 


kn 
t  +  1 


where  uk  is  the  kth  diagonal  element  of  P'  U0  P.  It  is  well  known  that  RT{a)  is  positive  definite 
if  and  only  if  a  falls  in  the  interval  (ai,  at),  where 

We  can  verify  that  (3.22)  has  a  unique  point  of  minimum  5  in  the  interval  (ai,  at),  which 
could  be  computed  numerically.  In  this  case  the  second  stage  estimate  S  is  in  a  closed  form 
and  is  given  by 


where 


5  =  ki(5)  =  — 


1 

2 


ELi  bt 

ELi  h  cos  (£»/(<  +  !)) 


_  cos  (kn/(t  +  1)) 
k  (1  +  25  cos(kir/(t  +  1))) 


3.4  Unstructured  Correlation  matrix 


Suppose  that  the  correlation  between  the  t  repeated  measurements  Rr(a)  —  Rt  is  an  unstruc¬ 
tured  positive  definite  correlation  matrix.  As  shown  in  Chaganty  (1997),  the  point  of  minimum 
Rt  in  Step  3  of  the  algorithm  described  in  Section  2.3,  can  be  obtained  recursively  starting 
with  any  positive  definite  diagonal  matrix  Ao  and  computing  A*  =  diag  (Rk_  ^  Uq  Ak_  j ) 1  “ 
at  the  A:th  step  and  stop  the  recursive  process  as  soon  as  Ak  ~  Rk-i  =  A.  The  matrix 
Rt  =  A-1/2  (A1/2  Uq  A1/2)1/2  A-1/2.  The  bias  corrected  correlation  matrix  is  given  by 


Rt  =  \ 


Rt  Rt 

(diag  {Uq))~1/2  Uq  (diag  (170))-1/2 


if  A  t  >  0 
otherwise. 


(3.23) 


where  A t  =  diag  [{Rt  °  Rt )-1  e]  where  e  is  a  vector  of  ones  and  o  denotes  the  Hadamard 
product  (see  Chaganty  and  Shults  (1999)).  Similarly,  we  can  construct  an  estimate  RP  for 
Rp(y)  =  Rp,  when  it  is  an  unknown  unstructured  correlation  matrix.  We  will  use  the  estimate 
Rp  in  the  examples  described  in  Section  5. 
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4  Large  sample  inference 


In  this  section  we  will  study  the  large  sample  properties  of  the  quasi-least  squares  estimates. 
We  show  that  the  estimates  are  consistent  and  asymptotically  normal.  We  also  propose  a  test 
statistic  for  testing  a  hypothesis  concerning  the  regression  parameter  and  derive  it’s  asymptotic 
distribution. 


4.1  Asymptotic  distribution 

Here  we  will  establish  consistency  and  joint  asymptotic  normality  of  the  quasi-least  squares 
estimates.  Let  9  =  ((3,  a,  y)'  be  the  vector  consisting  of  the  regression  and  the  correlation 
parameters.  Note  that  the  first  stage  quasi-least  squares  estimate  9  is  the  solution  of  the 
equation  Ya-i  hi{9)  =  0,  where  hi(9)  =  ( hu(9 ),  h2i(9),  h3i(9))'  and 

hu(0)  =  D'i(P)A;1/2(p)(Rjl(a)®Rp1('Y))zi(P) 


h2i(9)  =  tr  V(o;)  Z[{P)  Rpl{y)Zi{p) 


hzi(6)  =  tr  (  -  R^-  Zi{fi)  Rjl{a)  Z[{p) 


The  expected  value  of  hi(9)  does  not  depend  on  i  and  equals 

HO)  =  fo,  <t>tr  ,  0tr  (— lf ^  RP(y) 


(4.24) 


1  ( 

Since  E(zi{(3 ))  =  0,  we  can  check  that  In{9)  =  —  y]  E  ( 

n  i= l  v 


1  ^  t?  ( dhi(6)\ 


is  of  the  form 


’  lnn(0)  0  0 

ln(0)=  o  In22(9)  «*)  •  (4.25) 

0  I'n23(9)  In33(9) 

In  the  above  the  three  partitions  are  made  according  to  the  dimensions  of  the  three  vectors  /3, 

1  n 

a  and  y  respectively.  Similarly,  we  can  partition  Mn(9)  =  —  y  Cov  (hi(9))  as 

n  i=i 


"  Mn\\(6)  Mnl2(6)  Mnl3(9) 

Mn(9)  =  M'nn{9)  Mn22{0)  Mn23{6)  (4.26) 

_M'13(0)  M'n2i{9)  Mn33{9) 

where  Mnjk  =  £  £"=1  Co v(hji{9),  hki{9)).  We  can  check  that  Mnn(9)  -  4>Inn(9),  where 
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(4.27) 


Inu(9)  =  -tDi^TH0)Dm 

n  i= 1 

It  is  possible  to  express  the  other  matrices  Mnjk  and  In]k  in  (4.25)  and  (4.26)  as  functions  of  0, 
a,  7  and  <f>  explicitly.  See  Chaganty  (1997,  page  47)  for  some  details  concerning  those  formulas. 
The  next  theorem  establishes  the  asymptotic  normality  of  the  first  stage  quasi-least  squares 
estimates.  Below  we  will  use  the  acronym  AN  for  asymptotically  normal  as  in  Serfling  (1980, 
page  20). 

Theorem  4.1  Let  9  =  (0,  a,  7)'  be  fixed.  Let  9  =  (0,  a,  7)'  be  the  solution  of  the  equation 
hi{9)  =  0.  Let  Mn(9)  ->■  M(9)  and  In{9)  I{9)  as  n  ->  00.  Assume  that  a  central 

limit  theorem  holds  for  the  summands  hi{9)  and  they  satisfy,  as  a  function  of  9,  the  regularity 
conditions  needed  for  a  Taylor  series  expansion  to  hold.  Then 


^(9-9-  [. I(9)}~ 1  u(0))  -*  N(  0,  [/(A)]"1  M{9)  [/(0)]"1)  (4.28) 

as  n  00,  where  v(9)  is  defined  in  (4-24). 

Proof:  Since  Ya= 1  hi(9)  =  0,  using  a  Taylor  series  expansion  and  a  standard  argument  we  can 
verify  that  the  asymptotic  distribution  of  {9  -  9)  is  same  as  the  asymptotic  distribution  of 


[;i>w]  -I4W1-1  [;!>« 


(4.29) 


Note  that  E(hi(9 ))  =  v{9)  for  all  i.  Since  Mn(9)  converges  to  M(9)  and  the  summands  hi{9) 
satisfy  a  central  limit  theorem,  we  conclude  that 


i  ±  hM 
1=1 


IS 


AN 


M(9)\ 
n  ) 


Since  In{9)  converges  to  I {9),  from  (4.29)  and  (4.30)  we  get  that 


(4.30) 


,?-«  ^(w)]-vw.  MMMT)  (4,3.) 

which  is  equivalent  to  (4.28).  This  completes  the  proof  of  the  theorem.  □ 

Since  1(9)  is  the  limit  of  (4.25),  from  the  above  theorem  and  using  (4.24),  we  can  see  that  the 
first  stage  quasi-least  squares  estimate  0  is  a  consistent  estimate  of  0,  whereas  a  and  7  are 
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asymptotically  biased.  To  get  consistent  estimates  of  ct  and  7  we  will  make  a  transformation 
on  9,  which  depends  on  the  structures  of  the  correlation  matrices  Rt(o)  and  Rp{ 7).  Let 

b{9,  9)  =  (6i(A  P),  b2(a,  a),  63(7,  7))'  (4-32) 

be  a  function  of  9  =  (ft,  a,  7)'  and  0  =  (/?,  a ,  7)',  where 

&i(3M)  =  P-P 

b2{a,a)  =  tr  Rr{ot)\ 

\  a— a  / 

63(7,  7)  =  tr  -Rp(7)j  •  (4-33) 

Note  that  the  second  stage  quasi-least  squares  estimate  is  the  solution  9  =  k(0),  of  the  equation 
b(9,  9)  =  0.  The  next  theorem  shows  that  9  is  a  consistent  estimate  of  6  and  asymptotically 
normal. 

Theorem  4.2  Let  9  =  (/3,  a ,  7)*  be  fixed.  Let  9  be  as  in  Theorem  4.1  and  b(9,  9)  be  as  defined 

in  (4.32).  Assume  that  the  conditions  of  Theorem  4.1  hold.  Let  9  =  0,  a,  7)'  be  the  solution, 

say  k(9),  of  the  equation  b(9,  9)  =  0,  where  k(-)  is  a  continuous  function.  Then  9  is  a  consistent 
estimate  of  9  and  y/n  (9  -  9)  converges  in  distribution  to  a  normal  distribution  with  mean  0 
and  covariance  matrix 

T{9)  =  [Vk(0* )'  [/(A)]-1  M{9)  [/(0)]-1  V«(0*)]  (4.34) 

where  9*  =  0  +  [/(0)]_1  v(9)  and  Vk{0*)  isdK.{9)/d9'  evaluated  at  9  =  9*.  Finally,  4>  as  defined 
in  (2.11),  is  a  consistent  estimate  of  <f>. 

Proof:  From  Theorem  4.1,  we  know  that  9  —  (/?,  a,  fi)'  converges  to  9*  =  (/?*,  a*,  j*)'  as 
n  00.  Note  that  (3*  =  (3.  Using  the  weak  law  of  large  numbers,  we  can  check  that 

U((3,  7)  =—f]Z,i0)Rpl(j)Zi0)  -t  —  tr(i?p1(7*)  i?p(7))  Rt(<*)  (4.35) 

nP~[  P 

and 

V0,a)  =—iYjZS)R^{a)Z'i0)  ->  \  tr(i^V)  Rt{<*))  Rp(t)-  (4-36) 

nt  t 
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Taking  the  limit  as  n  — >  oo,  using  (4.35)  and  (4.36)  we  get 


1  n 

0  =  lim  —  Y  hi(9)  = 

7i— >oo  77,  ' 


2  =  1 


0 

-  tr(i?p1(7*)  Rp( 7))  62(0;*,  a) 
P 

j  trivia*)  Rt(oc))  63(7*,  7) 


(4.37) 


where  the  functions  62  and  63  are  defined  in  (4.33).  Since  /?*  =  /?,  from  (4.37)  we  can  see 
that  b(9*,  9)  =  0.  Thus  9  =  k(9*),  and  therefore  9  =  k{9)  converges  to  9.  This  establishes 
consistency  of  the  second  stage  quasi-least  squares  estimate  9.  By  the  delta  theorem  it  follows 
that  y/n  (9  -  9)  converges  in  distribution  to  a  normal  distribution  with  mean  0  and  covariance 
matrix  T  (0) .  Finally  consistency  of  9  implies  that  0  is  a  consistent  estimate  of  <f>.  This  completes 
the  proof  of  the  theorem.  □ 


Remark  4.1  Since  Mnu(9)  =  <t>Inii(0),  taking  limit  as  n  ->■  00,  using  (4.27)  and  Theorem  4.2, 
we  can  see  that  yfn  (/3  -  /?)  converges  to  a  q- variate  normal  distribution  with -mean  0  and 
covariance  matrix  4>[C{9)}~1,  where 


C{9)  =  lim 

7l-»00 


-Y/D,i(P)^~1(9)Di(P) 

n^i 


(4.38) 


Thus  j8  is  asymptotically  an  efficient  estimator.  The  same  asymptotic  property  also  holds  for 
the  estimate  of  /3,  obtained  solving  the  equation  (2.6)  after  substituting  S,  7  for  a  and  7 
respectively. 


4.2  Test  of  Hypothesis 

Suppose  that  we  are  interested  in  testing  the  null  hypothesis  K'  (3  =  m,  where  m  is  a  known 
s  x  1  vector  and  K  is  a  known  q  x  s  matrix  of  rank  s  <  q.  We  propose  the  test  statistic 


(jci  -  my  1  pm  sr1  ao for1  jq-1  mi  -  m) 

4> 


(4.39) 


where  %  =  S i(9)  and  $  is  defined  in  (2.11).  Large  values  of  Tn  are  considered  to  be  significant. 
It  can  be  seen  easily  from  Theorem  4.2  and  Remark  4.1  that,  under  the  above  null  hypothesis, 
Tn  converges  to  a  central  X2  with  s  degrees  of  freedom.  We  will  use  the  test  statistic  Tn  to  test 
various  hypothesis  in  the  examples  considered  in  Section  5. 


15 


5  Examples 


To  illustrate  the  estimation  of  the  regression  and  various  correlation  and  scale  parameters 
(by  implementing  the  algorithm  described  in  Section  2.3)  and  to  perform  certain  hypotheses 
testing,  in  this  section  we  present  the  analyses  of  two  real  life  data  sets.  Both  the  examples 
have  three  response  variables  measured  over  three  time  periods.  For  both  the  examples  we 
have  fit  general  correlation  structure  for  the  three  response  variables  and  an  AR(1)  structure 
for  the  measurements  observed  over  the  three  time  periods.  In  the  first  example  there  is  only 
one  group,  whereas  in  the  second  there  are  two  groups. 

5.1  AIDS  Data 

Here  we  consider  the  data  set  given  in  Table  1  of  Thompson  (1991).  Twenty  seven  patients  were 
involved  in  a  pilot  study  where  a  new  drug  was  being  tested  for  treating  AIDS.  Measurements 
on  three  variables  (p  =  3):  TMHR  score,  Karofsky  score,  and  T-4  cell  count,  were  observed  on 
each  of  the  27  (n  =  27)  patients  at  three  time  periods  (t  =  3),  in  the  beginning,  90  days  after 
the  treatment,  and  180  days  after  the  treatment.  We  fit  a  regression  model  for  these  data  with 
the  correlation  structure  [-Rj  (n)  ®  Ftp (7)],  where  Rt{®)  is  the  matrix  of  AR(1)  correlation 
structure  and  Rp(j)  =  Rp  is  the  unstructured  correlation  matrix.  In  order  to  achieve  this  that 
the  variance  of  each  variable  is  approximately  equal,  we  divide  each  response  variable  by  its 
sample  standard  deviation  (actually  a  value  close  to  it).  For  the  present  example,  we  divided 
the  observations  corresponding  to  each  of  the  three  variables  respectively  by  2.4,  12.6,  and 
276.0.  Interest  is  to  test  the  effect  of  the  drug  over  time.  Hence  the  null  hypothesis  we  want 
to  test  is  that  there  is  no  effect  over  time  for  each  of  the  three  variables.  As  a  preparation  for 
testing  this  hypothesis,  suppose  yijk  is  the  observation  on  the  jth  variable  taken  at  the  kth 
time  period  corresponding  to  the  ith  individual.  Then  consider  the  model 

R{yijk)  =  Pjkt  j  =  1>2, 3;  k  =  1,2,3;  and  i  1,...,27. 

or  E{yi)  =  Xi/3  =  p,  where  t/,  =  (yin,yi2i,Vai,—,Vi33Y  and  p  =  (mii,M21)  •••,M32,M33)'-  Then 
the  parameter  estimates  obtained  using  our  algorithm  are: 

p  =  (2.1836, 6.3198, 1.1770, 0.8488, 7.3486, 1.2093, 0.9259, 7.6426, 1.0645)', 


Rp  = 


1.0000  -0.5124  -0.4687 
-0.5124  1.0000  0.4700 

-0.4687  0.4700  1.0000 


5  =  0.6696,  and  <j>  =  0.8760. 


The  null  hypothesis  of  interest  then  can  be  expressed  as 


H0  :  Mil  =  Mj2  =  Mj3,  for  all  j  =  1, 2, 3. 


or  Hq  :  K'p  =  0,  with 
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'100-1  0  0  0  0  O' 

000  1  0  0  -1  0  0 

,  _  0  1  0  0  -1  0  0  0  0 

A  ~  0  0  0  0  1  0  0  -1  0 

001  0  0  -1  000 

.000  0  0  1  0  0  -1  J6x9 

To  test  Ho  we  use  the  test  statistic  (4.39)  and  the  observed  value  of  this  is  161.1373.  The 
P-value  using  the  chi-square  distribution  with  6  degrees  of  freedom  is  0.0000.  Thus  rejecting 
the  null  hypothesis. 

Next  we  want  to  see  whether  the  change  in  the  patient’s  condition  occurred  during  the 
first  90  days  and/or  the  second  90  days.  For  that  we  test  the  two  hypotheses  H0  :  K[n  =  0 
and  Ho  :  K'2[i  =  0,  where 

'100  -1  0  0000' 

^  =  010  0-1  0000 

[ooi  0  0  -1  000 


■  0  0  0  1  0  0  -1  o  o' 

#2  =  000010  0  -1  0. 

00000100  -1 

The  test  statistic  for  the  first  90  days  is  140.3697  with  a  P-value  using  the  chi-square 
approximation  on  three  degrees  of  freedom  is  0.0000.  The  test  statistic  for  the  second  90  days 
is  11.7403  with  a  P-value  of  0.0083.  Thus  our  analysis  shows  a  significant  change  in  both  the 
time  periods.  It  is  easy  to  check  by  performing  these  analyses  for  each  variable  separately  that 
there  is  no  significant  change  in  T-4  cell  counts  in  any  one  of  the  two  time  periods,  there  is  a 
significant  change  in  TMHR  score  in  the  first  90  days  time  period  only,  and  there  is  a  significant 
change  in  Karofsky  score  in  both  the  time  periods. 

One  last  thing  we  want  to  determine  in  this  analysis  is  whether  the  change  detected  is  in 
the  right  direction.  For  an  improving  patient  the  TMHR  score  should  decrease,  the  Karofsky 
score  should  increase  and  the  T-4  cell  count  should  increase  as  well.  For  determining  this  we 
fit  the  following  linear  model  for  the  mean: 

/ ijk  —  *b  Pij :r ,  x  1)  2, 3,  and  j  1, 2, 3. 

For  an  improving  patient  we  want  f3\i  <  0,  P12  >  0  and  /?i3  >  0.  Fitting  this  model  yields  B\\  = 
-0.6289  <  0,  /?i2  =  0.6614  >  0,  which  have  the  correct  signs  for  indicating  an  improvement. 
However,  /fo  =  -0.0562  <  0,  indicating  that  the  improvement  is  not  in  the  correct  direction. 


17 


But  as  mentioned  above,  an  analysis  of  T-4  cell  counts  had  shown  that  there  is  no  significant 
change  in  this  cell  counts.  Since  this  was  an  experimental  drug,  it  is  possible  that  it  was  not 
effective  in  controlling  the  AIDS  virus,  from  all  perspective. 


5.2  Zullo’s  Dental  Data 


To  further  illustrate  testing  of  various  hypotheses,  we  use  Zullo’s  dental  data  appeared  in 
Table  7.2  of  Timm  (1980).  These  data  were  also  analyzed  by  Naik  and  Rao  (1997)  assuming 
a  Kronecker  product  structured  covariance  matrix  for  the  covariance  between  the  observations 
on  an  individual,  but  using  maximum  likelihood  theory. 

The  study  was  concerned  with  the  relative  effectiveness  of  two  orthopedic  adjustments 
of  the  mandible.  Nine  subjects  were  assigned  to  each  of  the  two  orthopedic  treatments,  say  T\ 
and  T2  ( g  =  2,ni  =  9,n2  =  9),  called  activator  treatments.  The  measurements  were  made  on 
three  characteristics  (p  =  3),  namely,  SOr-Me  (in  mm),  ANS-Me  (in  mm),  and  Pal-MP  angle 
(in  degrees)  to  assess  the  changes  in  the  vertical  position  of  the  mandible  at  three  time  points 
(t  =  3)  of  activator  treatment.  The  three  null  hypotheses  of  interest  are:  there  is  no  group  and 
time  interaction,  there  is  no  group  effect  and  there  is  no  time  effect. 

Suppose  yijki  is  the  observation  on  the  kth  variable  at  the  Ith  occasion  corresponding 
to  the  ith  individual  in  the  jth  group.  We  assume  the  following  model  for  the  expected  value 

EiUijkl)  =  l^jkl  '■ 

fijki  =  vark  +  group jk  +  timeki  +  ( group  *  time)jki ■ 


To  express  the  above  model  in  the  standard  form  as  E(yt)  =  Xi/3,  we  first  divide  the  observa¬ 
tions  corresponding  to  the  three  variables,  SOr-Me,  ANS-Me,  and  Pal-MP  angle  by  7.34,  4.76, 
and  5.56  respectively.  Next  we  define  the  following  dummy  variables: 


Xvl  — 

%v2  — 

%v3  = 


\ 

i 

\ 


The  coefficient  of  each  of  these 


1  if  the  observation  is  on  variable  1 
0  otherwise', 

1  if  the  observation  is  on  variable  2 
0  otherwise,  and 

1  if  the  observation  is  on  variable  3 
0  otherwise. 

in  the  model  will  represent  the  unstructured  mean  of  that 


variable.  Next  let 


{1  if  the  observation  is  from  group  T2  and 
—  1  if  it  is  from  T\ . 

Testing  that  the  coefficient  of  xg  in  the  model  is  zero  will  test  the  hypothesis  that  there  is  no 
group  effect.  To  test  for  the  time  effect,  let 

{1  if  the  time  period  is  two 
—1  if  the  time  period  is  one 
0  otherwise,  and 
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{1  if  the  time  period  is  three 
—  1  if  the  time  period  is  one 
0  otherwise. 

Now  the  null  hypothesis  of  no  time  effect  is  same  as  testing  that  the  regression  coefficients 
corresponding  to  Xt\  and  xt2  are  simultaneously  zero.  Finally,  the  no  interaction  hypothesis 
can  be  tested  by  testing  that  the  regression  coefficients  corresponding  to  the  products  xg  *  xt\ 
and  xg*xt2  are  zero.  The  estimates  of  the  parameters  corresponding  to  these  eight  independent 
variables  are 


P  =  (16.7591, 13.6798, 4.4253, 0.0385, 0.1674, 0.1105,  -0.0032, 0.0009)'. 


With  the  assumption  Cov(yi)  =  <P(Rt{oi )  ®  Rp( 7)),  where  Rt{&)  is  the  matrix  of  AR(1) 
correlation  structure  and  Rp(j)  =  Rp  is  the  unstructured  correlation  matrix,  the  estimates  of 
the  correlation  parameters  are: 


RP  = 


1.0000 

0.7478 

0.0264 


0.7478 

1.0000 

0.3364 


0.0264 

0.3364 

1.0000 


a  =  0.9381,  and  $  =  0.9678. 


The  value  of  the  test  statistic  for  testing  no  interaction  is  0.0136,  indicating  no  interaction 
between  the  treatment  groups  and  the  time  period.  Similarly  test  for  testing  no  group  effect 
also  showed  no  significance  with  a  value  of  test  statistic  to  be  0.4861.  Only  time  effect  is 
significant  with  test  statistic  value  23.8208  and  the  corresponding  P-value  based  on  chi-square 
distribution  with  two  degrees  of  freedom  is  0.0001. 

6  Concluding  Remarks 

In  this  paper  we  discussed  the  analysis  of  multivariate  repeated  measures  data  assuming  that 
the  covariance  matrix  of  the  repeated  measurements  on  each  subject  is  a  scale  multiple  of 
Kronecker  product  of  two  correlation  matrices.  The  method  used  is  the  quasi- least  squares, 
which  does  not  make  any  assumptions  on  the  distribution  of  the  random  errors  except  for  the 
existence  of  the  first  two  moments.  We  have  suggested  an  algorithm  for  computing  the  estimates 
for  finite  samples.  And  proved  consistency  and  asymptotic  normality  of  the  estimators  for  large 
samples  and  suggested  tests  for  testing  any  linear  hypothesis.  Finally  we  have  implemented 
these  results  on  two  real  life  data  sets.  Since  the  quasi-least  squares  method  uses  the  solution 
for  a  set  of  best  (optimal  if  the  data  are  normal)  unbiased  estimating  equations,  it  is  one  of  the 
best  procedures  for  analyzing  these  data  without  making  any  distributional  assumptions. 
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